12
Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential Martin Schütte a, * , Uwe Müller b , Stephan Sandrock c , Barbara Griefahn a , Catherine Lavandier d , Benoit Barbot d a Leibniz-Institut für Arbeitsforschung an der Technischen Universität Dortmund, Ardeystraße 67, D-44139 Dortmund, Germany b Deutsches Zentrum für Luft- und Raumfahrt e.V., Linder Höhe, D-51147 Köln, Germany c Institut für angewandte Arbeitswissenschaft (IfaA), Uerdinger Str. 56, D-40474 Düsseldorf, Germany d Laboratoire MRTE, Université de Cergy Pontoise, 5 mail Gay Lussac, Neuville sur Oise, 95031 Cergy Pontoise Cedex, France article info Article history: Received 5 February 2008 Received in revised form 16 January 2009 Accepted 23 January 2009 Available online 10 March 2009 Keywords: Semantic differential Sound quality feature G-theory Departure sounds Arrival sounds abstract The increasing air traffic in the past has led to the fact that and more and more people were affected by aircraft noise. Since the potential of reducing noise level of current aircraft will be limited in the future after recent developed techniques have been realized in practice. Therefore, additional efforts are necessary to improve the sound quality of aircraft noise and give hints to aircraft manufactures concerning components that have to be acoustically optimized. The present study describes the development of an item list for a semantic differential (SD) that is appropriate for the determination of different perceptive features of aircraft sound quality and focuses on the estimation of the reliability of this new instrument. The selected 10 SD items have relation to the sound components emitted by the different technical devices of airplane engines. The analyses of the measurement characteristics of the different items were based on G-theory. This method presupposes data possessing interval-scale-level characteristics. Therefore, the used rating scales were examined with respect to this attribute. The results verified that the data fulfill this precondition. The results gave evidence that the judgments show large inter-individual differences. The proportion of variance which can be traced back to the factor ‘‘aircraft sound” amounted to at most one-fifth of total variance indicating that the differences between the aircraft sounds are rather small. Nevertheless, the findings demonstrate that it is possible to measure not all but several decisive aspects with sufficient reli- ability and expense. Ó 2009 Elsevier Ltd. All rights reserved. 1. Introduction The current ‘‘Eurocontrol medium forecast of instrument flight rule movements” predicts that 11.9 million movements can be ex- pected in Europe in 2013. This corresponds to an average annual growth of 3.4% in relation to the reference year 2006 [1]. Even if such predictions are based on many assumptions concerning for instance the economic development, the number of operating low-cost carriers etc., the reported prediction nevertheless shows that air traffic will steadily increase and thereby also the number of people concerned. Correspondingly, effort is necessary to investigate additional means for the acceptance of aircraft noise, since the potential of reducing noise will be limited after the currently developed techni- cal improvements have been realized in practice. One way to re- duce annoyance of people affected by aircraft noise is to remove the most disturbing aircraft sound components and define target sounds for aircraft manufacturers which are perceived as more pleasant [2–6]. A precise description of the quality of aircraft sounds is necessary in order to get more detailed information about possible technical sources bringing about noise components perceived as disturbing. The quality of aircraft sounds is studied here in the frame of the ‘‘sound quality” approach proposed by Blauert and Jekosch [7]. Recently, Jekosch [8] defines ‘‘sound qual- ity” as ‘‘the result of an assessment of the perceived auditory nat- ure of a sound with respect to its desired nature”. The perceived nature depends on the perceived quality features of the sound. These features are the recognisable characteristics that can be ob- served or detected and denominated by listeners. The desired nat- ure of a sound corresponds to the ‘‘totality of features as projected by individual expectations and/or functional requirements and/or social demands”. Accordingly the measurement of sound quality presupposes data concerning the perceived nature of a sound which can be collected via the semantic differential method. The semantic differential (SD) approach, already used in psy- cho-acoustic research for the measurement of particular auditory characteristics [e.g. 9–11], offers the possibility to characterize 0003-682X/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.apacoust.2009.01.004 * Corresponding author. Tel.: +49 (0) 2311084377. E-mail address: [email protected] (M. Schütte). Applied Acoustics 70 (2009) 903–914 Contents lists available at ScienceDirect Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust

Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

Embed Size (px)

Citation preview

Page 1: Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

Applied Acoustics 70 (2009) 903–914

Contents lists available at ScienceDirect

Applied Acoustics

journal homepage: www.elsevier .com/locate /apacoust

Perceived quality features of aircraft sounds: An analysis of the measurementcharacteristics of a newly created semantic differential

Martin Schütte a,*, Uwe Müller b, Stephan Sandrock c, Barbara Griefahn a,Catherine Lavandier d, Benoit Barbot d

a Leibniz-Institut für Arbeitsforschung an der Technischen Universität Dortmund, Ardeystraße 67, D-44139 Dortmund, Germanyb Deutsches Zentrum für Luft- und Raumfahrt e.V., Linder Höhe, D-51147 Köln, Germanyc Institut für angewandte Arbeitswissenschaft (IfaA), Uerdinger Str. 56, D-40474 Düsseldorf, Germanyd Laboratoire MRTE, Université de Cergy Pontoise, 5 mail Gay Lussac, Neuville sur Oise, 95031 Cergy Pontoise Cedex, France

a r t i c l e i n f o

Article history:Received 5 February 2008Received in revised form 16 January 2009Accepted 23 January 2009Available online 10 March 2009

Keywords:Semantic differentialSound quality featureG-theoryDeparture soundsArrival sounds

0003-682X/$ - see front matter � 2009 Elsevier Ltd. Adoi:10.1016/j.apacoust.2009.01.004

* Corresponding author. Tel.: +49 (0) 2311084377.E-mail address: [email protected] (M. Schütte).

a b s t r a c t

The increasing air traffic in the past has led to the fact that and more and more people were affected byaircraft noise. Since the potential of reducing noise level of current aircraft will be limited in the futureafter recent developed techniques have been realized in practice.

Therefore, additional efforts are necessary to improve the sound quality of aircraft noise and give hintsto aircraft manufactures concerning components that have to be acoustically optimized. The presentstudy describes the development of an item list for a semantic differential (SD) that is appropriate forthe determination of different perceptive features of aircraft sound quality and focuses on the estimationof the reliability of this new instrument. The selected 10 SD items have relation to the sound componentsemitted by the different technical devices of airplane engines.

The analyses of the measurement characteristics of the different items were based on G-theory. Thismethod presupposes data possessing interval-scale-level characteristics. Therefore, the used rating scaleswere examined with respect to this attribute. The results verified that the data fulfill this precondition.The results gave evidence that the judgments show large inter-individual differences. The proportionof variance which can be traced back to the factor ‘‘aircraft sound” amounted to at most one-fifth of totalvariance indicating that the differences between the aircraft sounds are rather small. Nevertheless, thefindings demonstrate that it is possible to measure not all but several decisive aspects with sufficient reli-ability and expense.

� 2009 Elsevier Ltd. All rights reserved.

1. Introduction sounds for aircraft manufacturers which are perceived as more

The current ‘‘Eurocontrol medium forecast of instrument flightrule movements” predicts that 11.9 million movements can be ex-pected in Europe in 2013. This corresponds to an average annualgrowth of 3.4% in relation to the reference year 2006 [1]. Even ifsuch predictions are based on many assumptions concerning forinstance the economic development, the number of operatinglow-cost carriers etc., the reported prediction nevertheless showsthat air traffic will steadily increase and thereby also the numberof people concerned.

Correspondingly, effort is necessary to investigate additionalmeans for the acceptance of aircraft noise, since the potential ofreducing noise will be limited after the currently developed techni-cal improvements have been realized in practice. One way to re-duce annoyance of people affected by aircraft noise is to removethe most disturbing aircraft sound components and define target

ll rights reserved.

pleasant [2–6]. A precise description of the quality of aircraftsounds is necessary in order to get more detailed informationabout possible technical sources bringing about noise componentsperceived as disturbing. The quality of aircraft sounds is studiedhere in the frame of the ‘‘sound quality” approach proposed byBlauert and Jekosch [7]. Recently, Jekosch [8] defines ‘‘sound qual-ity” as ‘‘the result of an assessment of the perceived auditory nat-ure of a sound with respect to its desired nature”. The perceivednature depends on the perceived quality features of the sound.These features are the recognisable characteristics that can be ob-served or detected and denominated by listeners. The desired nat-ure of a sound corresponds to the ‘‘totality of features as projectedby individual expectations and/or functional requirements and/orsocial demands”. Accordingly the measurement of sound qualitypresupposes data concerning the perceived nature of a soundwhich can be collected via the semantic differential method.

The semantic differential (SD) approach, already used in psy-cho-acoustic research for the measurement of particular auditorycharacteristics [e.g. 9–11], offers the possibility to characterize

Page 2: Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

Table 1Components and emitted noise components.

Engine component Emitted noise components

Fan Tones of all frequencies, buzz-saw, broadband noiseCompressor Tones (high frequencies), broadband noiseCombustor Tones (low frequencies), broadband noiseTurbine Tones (high frequencies), broadband noise: high frequenciesJet noise Broadband noise: low frequenciesAirframe Broadband noise: mid to high frequencies

Table 2Descriptive nouns of temporal and timbre concepts.

Temporal aspect: duration Timbre aspect: pitch

Start High pitched noiseEnd Low pitched noiseProgression FrequenciesLength TonalityResonance ToneSpeed Low tonesApproach

Temporal aspect: discontinuity Timbre aspect: texture of noise

Homogeneity [non-] Whistling noiseFluctuations BuzzRegularity BlowingDisruptions ScreechingVariations HummingBlips [sound-] Rumbling or booming

Table 3Items of the semantic differential.

Category Adjective pair 1 Adjective pair 2

Pitch Low tone–high tone Not booming–boomingBroadband Not hissing–hissing Not rumbling–rumblingBuzz-saw noise Not buzzing–buzzingImpression of speed Slow flying–fast flyingModulation Unsteady–steadyPower Not powerful–powerfulSafety Not safe–safeComfort Unbearable–bearable

904 M. Schütte et al. / Applied Acoustics 70 (2009) 903–914

aircraft sounds comprehensively, to sort sound events into ahomogeneous reference system, and to make comparisons on thelevel of semantic descriptors [12]. Since no feature adequate listfor measuring the acoustical properties of aircraft sounds is yetavailable, the present study aimed at the development of an appro-priate instrument for measuring different aspects of aircraft soundquality features and the determination of the reliability of the newinstrument.

2. Development of the SD: a preliminary verbalization test

The SD to be developed should provide information to engineersconcerning technical components of a jet and propeller engine and,if possible, the airframe. Accordingly, during the item selection thecharacteristics of the emitted noises of different engine compo-nents and the airframe had to be taken into consideration. Corre-spondingly, the SD should have the ability to map tonalcomponents, broadband noises which includes the airframe noiseas well as the buzz-saw component of aircraft sounds (see Table 1).

Furthermore, the aircraft sound is additionally composed ofthermal modulation effects, the Doppler and ground reflection ef-fects as well as the sound level [13]. Therefore, these aspects espe-cially the modulation effects, the Doppler and the sound level mustalso be considered.

To select the most appropriate adjectives pairs of the SD, a pre-liminary experiment has been based on verbalizations in French.The data analysis was done with the help of a linguistic expert.Ten male and four female students (22–27 years) who had neverparticipated in any sound quality experimentation ran this test di-vided into two sessions. The methodology of the verbalizationexperiment consists of presenting 18 pairs of aircraft sounds suc-cessively in a laboratory through headphones. The pairs were pre-sented in random order, but each time with the same referenceaircraft sound presented first. All stimuli were equalized to LA-Max = 84.5 ± 0.5 dB(A). For each pair, the subjects had to rate theirpreference on a seven-point scale and had to answer additionallyan open questionnaire to get more detailed information on the rea-sons for choosing a particular rating as well as on the parametersthe judgment was based on. Furthermore the participants shouldcharacterize the sound in their own words and should name wordsadequate for describing the differences between the two sounds.The verbalization focuses on characterization of sounds in a labo-ratory context. Participants were not asked to imagine themselvesin a real situation around airports. The categorization of nouns andadjectives extracted from verbalizations was done using an ono-masiological procedure in order to identify items relevant for theSD. It consists in classifying words as member of a same concept[14,15]. These concepts are not presupposed but are revealed bythe linguistic analysis. The overall list of words consisted of 104nouns with altogether 446 entries and 107 adjectives with all inall 372 mentions. Among all nouns and except the words ‘‘sound”and ‘‘noise” with respectively 105 and 36 entries, descriptivenouns of noise could be classified with regard to the concepts ofloudness, temporality and timbre whereby in Table 2 the itemsnamed most frequently are presented.

The same classification was used for the adjectives [16]. How-ever, some adjectives are more easily used in a SD than nouns.For example, to record the impression of the aircraft speed, theadjective fast which is the most frequently occurring adjective onthe duration aspect, has been used in the SD. For the same reason,the adjective steady which is the most frequently occurring adjec-tive on the discontinuity aspect has been selected. The definitelyselected words are depicted in Table 3. They are generally the mostfrequently ones mentioned by the subjects, representative of per-ceptual aspects related to technical components for timbre con-cepts, but also representative of the other aspects (temporality,loudness, affectivity) which are related to the context of the flight.The first perceptive characteristics of aircraft sounds related to thetechnical components (Table 1) are linked to the high frequencies.The perceptual qualification of high tone and its opposite low tonehave been selected to characterize the pitch of the sound. To recordthe perceptive aspect of high broadband noise, the word hissinghas been preferred to the word blowing (Table 2).

Combustor and jet noises are characterized by low frequencies.The word booming has relation to low pitch and the word rum-bling is connected with broadband noise. Obviously, the ‘‘buzz-saw” noise generated by the fan when the rotor tip speed becomessupersonic is related to the pair not buzzing–buzzing. The twodimensions of the temporal aspect are characterized by the pairsslow-fast and steady-unsteady. To avoid subjects’ confusion witha loudness item the word powerful has been preferred to loud.At last, affective perception of aircraft sound focuses on two as-pects. The first one has been revealed by the noun analysis andconcerns the feeling of danger. Therefore, the adjective safe hasbeen chosen to record this feeling. The second one concerns obvi-ously the hedonistic aspect of the perception for the verbalizationderives from a preference assessment. The word unpleasant whichis the most frequently occurring adjective on this hedonistic judg-ment, has been changed into unbearable to avoid the word pleas-ant as the opposite adjective on the comfort scale. The wordbearable has been then preferred to the word pleasant.

Page 3: Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

Fig. 1. Certification points (flyover) for starting and landing aircraft.

M. Schütte et al. / Applied Acoustics 70 (2009) 903–914 905

3. Selected aircraft sounds

The aircraft considered represent a sample from passenger air-planes currently in use. The aircraft sounds were recorded nearMunich and Stansted airport close to the aircraft certification pointdistance (see Fig. 1).

It was necessary to select recording sites where disturbingbackground noise could be avoided as far as possible. For recordinga binaural system was installed possessing linear free-field re-sponse. Further an array of ground microphones was used foracoustical distance measurements. All aircraft events were photo-graphed for calculating their altitude and identification of theirregistration numbers by which aircraft type could be determinedaccording to exact flight schedules from the respective airport.

From all these recordings 16 different sounds for starting air-planes and 15 sounds for landing airplanes were selected incorpo-rating business, regional, long range twin and quad jets withby-pass ratios from less than 3 up to 9. Furthermore, propeller air-planes were also included (see Table 4). For the core examinationof the SD all sounds were edited to a standard length of 40 s andwere equalized to the same effective perceived noise level (EPNL)representing the standard measure used for the noise certificationof aircraft. The equalization level was 89.5 EPNdB, which equatesto maximum sound levels from 75.4 to 83.5 dB(A).

4. Methodological procedure

The reliability analysis of the SD is based on generalizabilitytheory (GT). GT has the advantage to allow the estimation of themagnitude of multiple sources of measurement error and thusfacilitating the separation of major error sources [17,18]. In G-the-ory every measurement is regarded as a sample taken from an uni-verse of possible measurements which could have been taken foran object of measurement (e.g. airplane sound). This universe is

Table 4Used departure and arrival aircraft sounds (D: Departure sounds, A: Arrival Sounds, AP: A

By-pass ratio R Business jet Regional jet Single a

D A D A D

R < 3 AP 1 AP 2 AP 33 < R < 5 AP 4, AP5 AP 1, AP 2, AP 8 AP 6R = 6 AP 17 AP 15 AP 7, A8 < R < 9Props AP 13, AP 14 AP 7, AP 12, AP 13

named the universe of admissible observations and is character-ized by various variables – called facets – which represent possiblesources of variability in the measured scores. Each specific instanceof a facet is called condition. The mean of all measurements for oneobject of measurement across all conditions and all facets repre-sents the universe score for that object of measurement. Howeverthis value cannot be observed. An analysis based on G-theory givesinformation about how well conclusion from the observed scoresto the universal scores can be drawn. The application of G-theoryrequires two different steps namely the realization of a G-and aD-study. At first the G-study has to be accomplished.

The G-study aims at the estimation of the magnitude of thesources of variability as defined in the universe of admissibleobservations. This is done using the analysis of variance approachreferring to a group of statistical procedures, in which the observedvariability of scores (variance) is partitioned into components dueto different independent variables thus the facets. The ANOVA pro-cedure provides estimated variance components [19] reflecting thecontribution of each facet to measurement variability. Subse-quently a D-study (Decision study) has to be accomplished whichis based on the results of the G-study. The D-study gives informa-tion about the generalizability of the measurements. For that pur-pose two different reliability analogous parameters can becomputed. The so-called relative G-coefficient (q2) indicates on a0–1 scale, how well an observed score is likely to locate the objectsof measurement, relative to other members of the correspondingpopulation. The absolute G-coefficient (/) indicates also on a 0–1scale how well an observed score is likely to locate the objects ofmeasurement without regard to others in the respectivepopulation.

Furthermore, the D-study provides not only indications con-cerning the expenditure connected with application of the mea-surement procedure (e.g. optimal number of conditions of afacet) but also information about alternative measurement designsbringing about reliable scores (e.g. a facet explaining only a verysmall proportion of the variability in measurements need not tobe considered). The application of G-theory presupposes that themeasurements have interval-scale properties. An interval-scale isa scale in which the distance between any contiguous scores isthe same as the interval between any other two adjacent scoreswhereby the zero point represents not an absolute zero. In compar-ison to the item-response-theory, which is another widely used ap-proach in the development of questionnaires, GT has the advantageto permit easily generalizations to multifaceted universes [17].

In the present study the sound of an airplane represented theobject of measurement since the SD should give informationconcerning the sound features of aircraft noises. Moreover, theindividual level of noise sensitivity can be considered as one essen-tial source of measurement error since it is well known that sub-jective sound evaluations may be affected by this individualrelated attribute [20,21]. Noise sensitivity is regarded as a stablepersonality trait affecting the reactivity towards noise sources[22,23]. For measurement of this characteristic the ‘‘Noise Sensitiv-ity Questionnaire” (NoiSeQ) can be used [24]. The NoiSeQ is avail-

ircraft).

isle Long range twin Long range quad

A D A D A

AP 9P8 AP 3, AP 10 AP 9, AP 10 AP 4, AP 11 AP 11, AP 12 AP 5, AP 6

AP 16 AP 14

Page 4: Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

906 M. Schütte et al. / Applied Acoustics 70 (2009) 903–914

able in different languages (French, German, Italian, Hungarian,Swedish, English) whereby the translations can be presumed ascomparable to each other as an already existing study of the psy-chometric characteristics of the six language versions verified[25]. The global sensitivity score (GSS) of the NoiSeQ possesses areliability of at least 0.9 for relative as well as for absolute deci-sions [24,25]. Besides this, the evaluation of the various aircraftsounds could be influenced by the individual scoring behavior thatis to say some judges may show a scoring avoiding e.g. extremevalues. Therefore the judges represented a further source of mea-surement error which should be taken into consideration. In thiscase gender, age and culture seem to have no substantial influenceon sound evaluations as already accomplished tests showed [3].Therefore these variables need not to be considered as potentialsources of measurement error. Because the items of the SD capturedistinct sound features of an airplane sound, averaging over itemsdoes not make conceptual sense here. Therefore the G-analyseswere accomplished separately for each SD item.

Since application of G-theory presupposes data possessinginterval-scale level, the scale characteristics of the measurementsshould be tested. The necessary procedures encompass two differ-ent steps. Firstly starting from the observed rating values the scalevalues were determined as mean values. This procedure presup-poses that the data possess already interval- scale properties. Inaddition, the scale values were estimated according to the methodof successive intervals (MSI). This approach implies that the datahave only ordinal-scale characteristics but supplies scale valuescharacterized by interval-scale attributes whereby an iterativeestimation procedure can be used [26]. This approach is based onthe cumulative Gauss distribution and acts on the assumption thatthe standard deviation of the various category boundaries are iden-tical and that the correlations are constant between categoryboundaries and judged objects. Even though there exists no statis-tical test for examining the goodness of fit of the scaling results,usually the average absolute deviation between the observedcumulative percentages and the predicted cumulative percentagesstarting from the scaling results is reported [27]. The applied 7-le-vel rating scale (�3 to +3) can be regarded as an interval-scale ifthe scale values obtained by averaging are transferable by lineartransformation into the scale values calculated on base of theMSI [28].

Fig. 2. Correlation between average rating values

5. Core examination of the SD: experimental procedure

The whole experimental procedure was computer controlledand encompassed two different sessions, one with the presentationof departure sounds and the other with the presentation of the ar-rival sounds. Each session lasted approximately 90 min in order toavoid fatigue or reduced motivation etc. of the participants. Thesessions were accomplished on two different days. One half ofthe subjects started with the evaluation of the departure soundsand then switched to the arrival sounds. The other half of the sam-ple accomplished the judgments in reverse order.

In the first session the participants had to answer a generalquestionnaire asking for socio-demographic facts. Next they hadto fill in the NoiSeQ. Each session started with a mood question-naire followed by the presentation of 6 artificial sounds (broad-band noise plus respective sound feature) for facilitating thecomprehension of the items ‘‘high tone”, ‘‘unsteady”, ‘‘buzzing”,‘‘hissing”, ‘‘booming” and ‘‘rumbling”. These simple syntheticsounds were applied in order to exclude a possible preconditioningof later ratings. After these example sounds four real aircraftsounds were presented to familiarize the participants with theusage of the SD when evaluating an aircraft sound. All sounds werepresented via open headphones (AKG 501). The ten adjective pairsof the SD were shown on one screen in random order. Each aircraftsound was repeatedly played back during the evaluation using theSD items. There was no time restriction for giving the ratings. Thesubjects had to click a box indicating their judgment. The studywas conducted in seven different laboratories where the sameexperimental procedure and technical equipment was used toguarantee comparable conditions.

6. Departure sounds

6.1. Testing the scale characteristic of the SD items

The analysis of the properties of the used ratings scales is basedon a sample of 72 persons from 7 European countries, 32 womenand 40 men and in equal shares younger (<40 years) and older(>40 years) persons. All participants did not live in the vicinity ofan airport because the participants should not be prejudicedagainst aircraft sounds. Based on the experimentally obtained rat-

and scale values per item (departure sounds).

Page 5: Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

Table 6D-study results (r2: estimated variance component; %: explained proportion of totalvariance).

Items Person (P) Aircraft sound (A) Interaction P*A, error

r2 (%) r2 (%) r2 (%)

Low/high tone 0.27 11.49 0.41 17.45 1.67 71.06Booming 0.71 33.65 0.08 3.79 1.32 62.56Powerful 0.51 28.18 0.16 8.84 1.14 62.98Rumbling 0.67 34.18 0.13 6.63 1.16 59.18Buzzing 0.49 19.44 0.28 11.11 1.75 69.44Slow/fast 0.37 16.97 0.36 16.51 1.45 66.51Unsteady/steady 0.47 19.92 0.37 15.68 1.52 64.41Unsafe/safe 0.67 34.54 0.04 2.06 1.23 63.40Hissing 0.56 25.00 0.17 7.59 1.51 67.41Bearable 0.54 26.34 0.13 6.34 1.38 67.32

M. Schütte et al. / Applied Acoustics 70 (2009) 903–914 907

ings the scale value of each departure sound was determined as theaverage judgment of the participants per item. Besides this, thescale values of the sounds were estimated using the method of suc-cessive intervals (see Fig. 2). The examination of the quality of theMSI-scaling results verified that there exists a reasonable matchbetween the observed and predicted cumulative percentages sincethe deviations take only small values in the range between 1.75%(item ‘‘safe”) and 2.37% (item ‘‘booming”). The correlation coeffi-cients between the mean ratings and the MSI-scale values variednumerically in the range between 0.974 and 0.996 (see Fig. 2).

Accordingly the close relationship between both scaling resultsjustified the conclusion that the applied rating scales correspond toan interval-scale. Therefore G-theory could be applied directly tothe original rating values.

6.2. G-study

The G-study – giving indications about the proportion of vari-ance explained by the facets ‘‘person” and ‘‘noise sensitivity” – tookinto account the data from the 72 persons who already partici-pated in the tests concerning the characteristics of the ratingscales. Using the median- score of the norm-sample the partici-pants were classified into low and high sensitive persons [24].The two facets could not be combined completely with each otherbecause a judge belongs only to one sensitivity group. Thereforethe G-study is not based on a completely crossed but a nested de-sign with persons nested within the levels of the factor ‘‘noise sen-sitivity”. Further on the properties of the ANOVA model must bespecified. All facets were treated as a random effect. Such a defini-tion is appropriate since there exists the intention to generalize be-yond the persons participating in the study to other persons.

Furthermore, the criterion used for dividing the judges accord-ing to their level of noise sensitivity is one of others and thereforethis facet represents also a random effect. Five variance compo-nents (r2) could be estimated in this model, namely ‘‘Noise sensi-tivity (S)”, ‘‘Person (P:S)”, ‘‘Airplane sound (A)”, the interactionbetween ‘‘noise sensitivity and airplane sound (S�A)” as well asthe interaction between ‘‘person and aircraft sound (P�A:S)” whichis confounded with all further unmeasured variation (see Table 5).

The variance which could be traced back to the factor ‘‘airplanesound” – thus the object of measurement – showed considerablevariation. The item ‘‘booming” did not adequately differentiate be-tween the various sounds since the proportion of variance whichcould be traced back to the factor ‘‘sound” had only a value of1.96%. For the other items the proportion of explained variance at-tained a more satisfactory level ranging from 5.48% (item ‘‘rum-bling”) to 15.04% (item ‘‘low/high tone”). Furthermore, thejudgments showed remarkable inter-individual differences whichwere smallest for the item ‘‘low/high tone” (9.61%) and largestfor the item ‘‘powerful” (23.53%). Overall the facet ‘‘noise sensitiv-

Table 5G-study results (r2: estimated variance component; %: explained proportion of total varia

Item Noise sensitivity (S) Person:S (P:S)

r2 (%) r2 (%)

Low/high tone 0.00 0.00 0.24 9.61Booming 0.06 2.23 0.53 19.59Powerful 0.00 0.00 0.47 23.53Rumbling 0.00 0.00 0.44 18.55Buzzing 0.00 0.00 0.43 12.03Slow/fast 0.00 0.00 0.43 12.03Unsteady/steady 0.00 0.00 0.45 16.54Unsafe/safe 0.00 0.00 0.41 20.90Hissing 0.00 0.19 0.39 16.10Bearable 0.02 1.06 0.40 17.65

ity” had no substantial influence on the ratings. Likewise the inter-action ‘‘S�A” did not explain a significant proportion of variance.However, for the residual term the greatest estimated variancecomponents resulted. Accordingly more than 65% of variance wasattributable to the interaction ‘‘P�A:S” and/or other unsystematicas well as systematic sources of variation that were not measuredin this study. Based on the results of the G-study the employmentof the SD did not necessitate consideration of noise sensitivity be-cause the measurements are not influenced by this facet.

6.3. D-study

The aim of the following D-study was to get information con-cerning the reliability of the different items of the SD taking intoconsideration only the relevant source of measurement errornamely the facet ‘‘person”. The D-study was based on 27 femalesand 25 males as well as younger and older persons who werenot involved in previous tests. Again data analysis assumed a com-plete random ANOVA model.

The proportion of variance explained by the factor ‘‘airplanesound” varied extensively between the various items (see Table6). The rating values of the items ‘‘booming” and ‘‘safety” were onlymarginally influenced by the airplane sounds as the explained per-centage of variance was only 3.79 and 2.06%, respectively. For theremaining items the corresponding proportions took values be-tween 6.34% (‘‘bearable”) and 17.45% (‘‘low/high tone”).

Furthermore the evaluations were largely affected by the facet‘‘person” as the explained proportions of variance varied between11.49% (‘‘low/high tone”) and 34.54% (‘‘safety”).

The most striking variance component resulted again for theresidual since 62.56 (‘‘booming”) up to 71.06% (‘‘low/high tone”)of data variance can be traced back to the interaction ‘‘P�A” andfurther uncontrolled effects.

nce).

Airplane sound (A) S*A P*A:S, error

r2 (%) r2 (%) r2 (%)

0.38 15.04 0.03 1.37 1.85 73.980.05 1.96 0.01 0.49 2.04 75.730.14 7.14 0.00 0.21 1.37 69.130.13 5.48 0.01 0.57 1.80 75.390.38 10.65 0.10 2.73 2.66 74.580.38 10.65 0.10 2.73 2.66 74.580.25 9.43 0.04 1.64 1.96 72.390.11 5.90 0.00 0.00 1.42 73.200.20 8.12 0.00 0.00 1.85 75.600.14 6.39 0.02 0.88 1.68 74.02

Page 6: Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

Table 7Relative (q2) and absolute (/) G-coefficients for a single measurement per item.

Item q2 (N = 1) / (N = 1)

Low/high tone 0.20 0.17Slow/fast 0.20 0.16Unsteady/steady 0.20 0.16Buzzing 0.14 0.11Powerful 0.13 0.09Hissing 0.10 0.08Rumbling 0.10 0.06Bearable 0.08 0.06Booming 0.06 0.04Unsafe/safe 0.03 0.02

908 M. Schütte et al. / Applied Acoustics 70 (2009) 903–914

Starting from the estimated variance components the relative(q2) and absolute (/) G-coefficients were calculated (1,2).

q2 ¼ r2A

r2A þ

r2A�PnP

ð1Þ

/ ¼ r2A

r2A þ

r2P

nPþ r2

A�PnP

ð2Þ

For a single measurement – that is the evaluation of one aircraftsound done by one judge (nP = 1) - both q2 and / fell below the low-er limit of 0.70 recommended by DIN EN ISO 10075-3 [29] as theminimum requirement for reliability (see Table 7).

Accordingly the necessary number of measurements was esti-mated in order to obtain measurements which are up to standardas recommended by ISO 10075-3 for orienting (0.70 < q2// < 0.80),screening (0.80 < q2// < 0.90) and precision (0.90 < q2//) measure-ments (Table 8). The sample size required to obtain measurementscharacterized by a relative G-coefficient of 0.70 had a span goingfrom 10 (items ‘‘low/high tone”, ‘‘slow/fast”, ‘‘unsteady/steady”)to 75 judges (item ‘‘safe”). If a reliability of 0.80 is aimed at thenumber of judges adds up to 17, respectively 128. A precision mea-surement presupposes a collective of 37 or rather 287 judges. Therequired sample sizes for an adequate absolute reliability aresomewhat higher and amount to 12–116 for orienting measure-ments, 19–198 for a screening and 43–444 judges for precisionmeasurements.

The measurement expenditure shows remarkable differences interms of the particular item of the SD. Especially the item ‘‘safe” isstriking since already an orienting measurement presupposes asample size of 75, respectively 116 judges. Nevertheless, theremaining items require a considerable smaller number of judgesif a relative or absolute reliability of 0.70 or 0.80 is aimed at.

6.4. Mean values and confidence intervals

In order to obtain an indication of the informational value of theSD the average values as well as the corresponding confidence

Table 8Necessary number of judges in dependence of different reliability levels.

Item Relative G-coefficient (q2) Absolute G-coefficient (/)

0.70 0.80 0.90 0.70 0.80 0.90

Low/high tone 10 17 37 12 19 43Slow/fast 10 17 37 12 21 46Unsteady/steady 10 17 37 13 22 49Buzzing 15 25 56 19 32 91Powerful 17 28 63 24 41 91Hissing 21 36 80 29 49 109Rumbling 22 37 83 34 59 131Bearable 26 44 98 36 61 136Booming 38 64 144 58 98 220Unsafe/safe 75 128 287 116 198 444

intervals were calculated for each airplane starting from the eval-uations of the 52 judges. In doing so only those items of the SDwere taken into consideration which exceed the lower limit valueof 0.70 for absolute decisions (absolute G-coefficient). The confi-dence intervals were computed on base of the so-called conditionalstandard errors of measurement (SEM) since the SEM vary as afunction of the true score and observed score [30]. The confidenceintervals were estimated at the 95% level (see Fig. 3).

At first it is striking that the average rating values vary through-out in a range of approximately ± 1 scale level about the midpointof the rating scale. Correspondingly, the differences between thevarious aircraft are rather small. Nevertheless the sample size(N = 52) was large enough to allow a reliable measurement of eightaspects of perceived nature of sounds.

Regarding the item ‘‘bearable” the mean values of two airplanesnamely No. 11 (long range quad jet, R = 6) and No. 2 (regional jet,R < 3) fall below the mid-scale point (mean values: �0.67, �0.54)and can be considered as rather unbearable. On the other handthe airplane rated best (No. 16) is a long range quad jet possessinga very high by-pass-ratio in the range between 8 and 9 (mean va-lue: 0.88).

Furthermore, none of the 17 airplanes is evaluated as ‘‘not” or atleast ‘‘less” buzzing. The highest average value resulted here forairplane 2 (mean value: 1.54), a regional jet with a small by-pass-ratio (R < 3). Considering the item ‘‘hissing” the mean ratingof only one airplane (No. 14) – a propeller aircraft – falls belowthe indifference point (mean value: �0.77).

Moving on to the tonal components of aircraft noise three air-craft are characterized by a sound with mainly low tonal compo-nents (mean value: �1.37, �0.54, �0.54), that is to say twopropeller aircraft (No. 14, 3) and a single aisle aircraft with a by-pass-ratio of about 6 (No. 7).

Examining the attribute ‘‘powerful” airplane No. 14 (propeller)is the only one characterized by an average value of –0.71 that isto say possessing less power. The sounds of all 17 aircraft are per-ceived as rumbling whereby the highest average value (mean va-lue: 1.21) resulted for airplane 6 (single aisle, 3 < R < 5). Thevelocity of aircraft No. 14 (propeller, mean value: �1.54), No. 7(single aisle – R = 6, mean value: �0.65), No. 3 (single aisle –R < 3, mean value: �0.63), No. 9 (long range twin – R = 6, mean va-lue: �0.56) and No. 13 (propeller, mean value: �0.44) are rated asmore slow going whereas for aircraft No. 11 (long range quad –R = 6) the maximal positive value resulted (mean value: 1.04) indi-cating that this airplane is evaluated as rather fast. The regulatoryof four aircraft noises, namely No. 7 (single aisle – R = 6, mean va-lue: �0.92), No. 1 (business jet – R < 3, mean value: �0.56), No. 8(single aisle – R = 6, mean value: �0.56) as well as No. 15 (longrange quad – 8 < R < 9, mean value: �0.40) are estimated as un-steady. On the other hand, aircraft No. 16 is the one possessingthe highest constancy (long range quad – 8 < R < 9, mean value:1.38).

All in all aircraft No. 14 – a propeller airplane – is the one char-acterized by the minimal average value concerning four SD items.The sound of this aircraft is not hissing, possesses low tonal com-ponents and is perceived as not powerful as well as slow going. Be-yond this the calculated mean ratings and the confidence intervalsdemonstrate that a differentiation between the various sounds ispossible although the differences are rather small.

7. Arrival sounds

7.1. Testing the scale characteristic of the SD items

In order to assure that the ratings of the arrival sounds also pos-sess interval-scale characteristics the same methodological proce-

Page 7: Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

M. Schütte et al. / Applied Acoustics 70 (2009) 903–914 909

dure was used for proving the scale properties as already deployedwhen examining the departure sounds. The analysis is based on asample of 60 persons (25 female, 35 male) from seven Europeancountries comprising approximately fifty percent younger and old-er persons. The evaluation of the quality of the MSI-scaling resultswas again based on the mean absolute deviation between the ob-served and predicted cumulative percentages of the judgments.This parameter took values in the range between 1.76 (item ‘‘pow-

Fig. 3. Mean rating values and confidence in

erful”) and 2.37% (item ‘‘low/high tone”) indicating an acceptablequality of the scaling results.

The correlations between the mean rating values and those ob-tained via the method of successive intervals took values in therange between 0.973 and 0.998 (see Fig. 4).

Therefore the conclusion was justified that the ratings hadinterval-scale properties and with this the application of G-theorywas admissible.

tervals per airplane (departure sounds).

Page 8: Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

Fig. 4. Correlation between the average rating values and the scale values per item (arrival sounds).

910 M. Schütte et al. / Applied Acoustics 70 (2009) 903–914

7.2. G-study

The G-study was based on the data from the 60 persons alreadyused for testing the characteristics of the rating scales. The modelimplied again the factor ‘‘aircraft sounds” as the object of measure-ment and the factors ‘‘person” and ‘‘noise sensitivity (low/high)” asfacets with persons nested within the levels of the factor ‘‘noisesensitivity”. All facets were treated as random effects. Regardingthe factor ‘‘aircraft sound” the item ‘‘powerful” as well as ‘‘safe”did not adequately differentiate between the various sounds sinceonly 1.48 respectively 3.88 % of variance can be traced back to thisfactor (see Table 9).

The sound related variance of the other items attained a moresatisfactory level varying between 7.55% (item ‘‘buzzing”) and19.48% (item ‘‘low/high tone”). Furthermore, the factor ‘‘person”had a pronounced effect on the ratings as the explained proportionof variance demonstrated ranging from 14.06% (item ‘‘low/hightone”) to 41.98% (item ‘‘safe”). The facet ‘‘noise sensitivity” pos-sesses only a negligible influence on the ratings. Likewise the inter-action ‘‘S�A” also did not explain a significant proportion ofvariance. However, the largest proportion of variance namely morethan 50% could be traced back to the interaction ‘‘P�A:S” and/orother unsystematic as well as systematic sources of variation thatwere not measured in this study.

Table 9G-study results (r2: estimated variance component; %: explained proportion of total varia

Item Noise sensitivity (S) Person:S (P:S)

r2 (%) r2 (%)

Low/high tone 0.00 0.00 0.36 14.06Booming 0.03 1.00 0.55 21.43Powerful 0.00 0.00 0.79 36.18Rumbling 0.00 0.00 0.90 32.93Buzzing 0.00 0.00 1.18 36.34Slow/fast 0.03 1.21 0.43 17.56Unsteady/steady 0.00 0.00 0.58 23.49Unsafe/safe 0.00 0.00 0.77 41.98Hissing 0.01 0.56 0.33 16.97Bearable 0.05 2.26 0.69 29.68

The results of the accomplished G-study were comparable tothose obtained for the departure sounds as the findings again dem-onstrated that the noise sensitivity of the judges did not contributeto an important extent to measurement error. Therefore noise sen-sitivity can remain unconsidered when measuring the perceivedauditory nature of arrival sounds.

7.3. D-study

The D-study was based on a new sample of 66 persons (33 fe-male, 33 male) who were not involved in previous tests. The dataanalysis was based on a 2-factorial complete random effects modelencompassing the factor ‘‘aircraft sound” as the object of measure-ment and the factor ‘‘person” as the only dominant source of mea-surement error. Again the items ‘‘powerful” as well as ‘‘safe” andadditionally ‘‘booming” did not show a substantial variationdepending on the various aircraft sounds since only 2.54%, respec-tively 2.94% and 3.92% of variance were explained by this factor(see Table 10). The other items were more responsive to the vari-ous sounds as the corresponding proportions of variance demon-strated which varied between 6.29% (‘‘buzzing”) and 16.53%(‘‘slow/fast”).

The proportion of variance which could be traced back to thefactor ‘‘person” varied between 13.94% (‘‘low/high tone”) and

nce).

Airplane sound (A) S*A P*A:S, error

r2 (%) r2 (%) r2 (%)

0.49 19.48 0.00 0.00 1.69 66.450.27 10.54 0.05 1.79 1.66 65.240.03 1.48 0.00 0.00 1.36 62.340.29 10.48 0.00 0.00 1.55 56.590.24 7.55 0.00 0.00 1.81 56.110.42 17.47 0.00 0.00 1.55 63.760.36 14.54 0.01 0.31 1.53 61.660.07 3.88 0.01 0.70 0.98 53.440.25 12.52 0.00 0.00 1.37 69.960.19 8.44 0.00 0.00 1.38 59.62

Page 9: Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

Table 10D-study results (r2: estimated variance component; %: explained proportion of totalvariance).

Items Person (P) Aircraft sound (A) Interaction P*A, error

r2 (%) r2 (%) r2 (%)

Low/high tone 0.35 13.94 0.36 14.34 1.80 71.71Booming 1.10 38.77 0.11 3.92 1.63 57.31Powerful 0.66 33.50 0.05 2.54 1.26 63.96Rumbling 0.79 30.38 0.24 9.23 1.57 60.38Buzzing 1.10 36.42 0.19 6.29 1.73 57.28Slow/fast 0.68 28.81 0.39 16.53 1.29 54.66Unsteady/steady 0.66 27.73 0.26 10.92 1.46 61.34Unsafe/safe 0.88 43.14 0.06 2.94 1.10 53.92Hissing 0.48 22.43 0.16 7.48 1.50 70.09Bearable 0.58 26.85 0.15 6.94 1.43 66.20

Table 12Different levels of reliability and necessary number of judges.

Item Relative G-coefficient (q2) Absolute G-coefficient (/)

0.70 0.80 0.90 0.70 0.80 0.90

Slow/fast 8 14 30 12 21 46Low/high tone 12 21 46 15 25 55Unsteady/steady 14 23 51 19 33 74Rumbling 16 24 61 24 41 91Buzzing 22 37 82 35 60 134Hissing 22 37 83 29 49 109Bearable 23 38 85 31 54 120Booming 35 59 132 58 99 221Unsafe/safe 45 76 171 80 137 307Powerful 55 94 210 83 142 320

M. Schütte et al. / Applied Acoustics 70 (2009) 903–914 911

43.14% (‘‘safe”) and therefore was for the most items greater thanthe variance explained by the factor ‘‘sound”. Nevertheless, far over50% of variance dated from the interaction ‘‘P�A” as well as furtheruncontrolled effects.

Based on the estimated variance components the relative andabsolute G-coefficient for a single measurement was calculated.Considering all 10 items the maximum value amounted to 0.23(q2) respectively 0.17 (/) going clearly below the lower limit valueof 0.70 (Table 11).

Therefore again the number of measurements needed for an ori-enting, screening and precision measurement was estimated (seeTable 12). Regarding the necessary sample sizes for an orientingmeasurement the required number of persons varies between 8(item ‘‘slow/fast”) and 55 (item ‘‘powerful”), respectively 12 and83 judges (absolute G-coefficient). A measurement arranged forscreening purpose requires 14 and 94 (q2) or 21 and 142 judges(/).

If the items of the SD were used for precision measurement thenumber of raters varies between 30 and 210 (q2) as well as 46 and320 (/).

The necessary sample sizes to obtain reliable measurementsshow large differences between the various SD items. This findingis comparable to the results already available for the departuresounds. The greatest expenditure arises if the characteristics ‘‘safe”and ‘‘powerful” are measured, especially for evaluations adequatefor screening or high precision purposes.

7.4. Mean values and confidence intervals

Starting from the evaluations of the 66 judges again the meanrating values as well as the corresponding condition related confi-dence intervals (p = 0.05) were calculated for all items character-ized by an absolute G-coefficient of 0.7 and more.

Concerning the item ‘‘bearable” only airplane No. 6 (see Fig. 5)which is a long range quad jet with a by-pass-ratio (R) of 6 pos-

Table 11Relative (q2) and absolute (/) G-coefficients per item.

Item q2 (N = 1) / (N = 1)

Slow/fast 0.23 0.17Low/high tone 0.17 0.14Unsteady/steady 0.15 0.11Rumbling 0.13 0.09Hissing 0.10 0.08Bearable 0.10 0.07Buzzing 0.10 0.06Booming 0.06 0.04Unsafe/safe 0.05 0.03Powerful 0.04 0.03

sesses an average rating value falling below the indifference levelof the applied rating scale (0) and can be considered as ratherunbearable (mean value: �0.80). Furthermore, this aircraft soundtakes the maximal average rating value with reference to the item‘‘hissing” (mean value: 2.09) and is characterized by high tonalcomponents (mean value: 2.06). There is no aircraft sound whichcan be presumed to be not hissing and likewise there is no soundcharacterized by low tonal components.

However, the noise of aircraft No. 14 – a long range quad jetwith a by-pass-ratio between 8 and 9 – may be classified as notbooming (mean value: �0.58), not buzzing (mean value: �0.88)and not rumbling (mean value: �0.73) but as fast (mean value:1.33) as well as steady (mean value: 1.48). The perception of air-plane No. 7 which is a single aisle with a by-pass ratio of about 6is the opposite way around. This airplane is evaluated as booming(mean value: 0.80), buzzing (mean value: 1.02), rumbling (meanvalue: 1.09), slow (mean value: �0.94) and unsteady (mean value:�0.71).

Additionally airplane No. 5 – a long range quad jet with a by-pass-ratio of 0.6 is also rated as slow flying. Furthermore a longrange twin jet (No. 4) and a business jet (No. 15) both with a by-pass-ratio of 6 were also evaluated as not booming.

Again the findings show – even though the deviations are rathersmall – that a differentiation between the sounds is possible if asample of adequate size is used. Furthermore, the findings demon-strate that the sound of aircraft being equipped with the samenumber of turbines and characterized by a comparable by-pass-ra-tio are differently evaluated. Therefore, other noise sources as forexample the airframe have a non negligible influence on soundperception.

8. Summary and discussion

The present study aimed at the development of a SD measuringthe quality features of aircraft sounds which can be used in furtherinvestigations designed for improving technical components of anaircraft. For this development it is necessary to determine the reli-ability of the SD items. In total 10 items were identified whichwere related to different sound characteristics of the technicalparts of an airplane. The analysis of the measurement characteris-tics of the different items was based on G-theory. This method pre-supposes data possessing interval-scale-level characteristics.Therefore, the used rating scales were examined with respect tothis attribute. The results verified that the data fulfill this precon-dition. The samples used in this study only consists of airport non-residents in order to avoid judgment biases resulting from negativeattitudes concerning air traffic and aircraft noise. Since some of theSD items were not self-explanatory, a pre-training of the partici-pants was necessary. For this purpose artificial sounds were pre-sented to the subjects to ensure that this preliminary practice

Page 10: Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

Fig. 5. Mean rating values and confidence intervals per airplane (arrival sounds).

912 M. Schütte et al. / Applied Acoustics 70 (2009) 903–914

does not lead to response biases during the evaluation of the origi-nal aircraft sounds.

The accomplished G-study took into consideration the individ-ual existing noise sensitivity and the judges as facets wherebythe findings showed that the noise sensitivity represented no sub-stantial source of measurement error. Nevertheless, the resultsproved evidence that the judgments are subject to large inter-indi-vidual differences. Based on the results of pre-studies the assump-

tion is justified that the occurring inter-individual differences arenot due to the age or gender of the participants [3,6]. Furthermore,all subjects were not only familiarized with sound characteristicslike buzzing, hissing etc. but also with the usage of the rating scalesin order to reduce the influence of response sets (e.g. avoiding ex-treme values, tendency towards the mean value). For limiting in-ter-individual variance in the usage of rating scales there existsthe general recommendation that a standardization of the individ-

Page 11: Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

M. Schütte et al. / Applied Acoustics 70 (2009) 903–914 913

ual judgment rules using for example a special sound as a referencemay possibly solve the problem. Such a procedure was not accom-plished because all results would be valid only with respect to thechosen sound and thus the informational value of the findingswould be limited. Furthermore, such a standardization may be dif-ficult to establish if the sound evaluations are accomplished in fieldsettings. Even if the possibility seems to be quite attractive to useforced choice procedures like the paired comparison method toeliminate inter-individual differences in scale usage the computa-tion of scale values based on the paired comparisons of the com-plete sample of judges presupposes that the judgments done byevery judge fulfill the criterion of consistency [31]. Furthermore,the judges should show a very similar judgment behavior leadingto a high group concordance [32]. Based on the results obtainedfor the SD items it seems to be improbable that the applicationof the method of Paired Comparisons would have been more ade-quate. One reason is that small differences between the objectsjudged can lead to inconsistencies in the judgments [33] and theobtained findings show that the airplanes sounds deviate only toa small degree from each other. The second argument is that thehigh proportion of variance which can be traced back to the facetjudges militate for a low group concordance.

Otherwise the high proportion of inter-individual variancecould also be an indication that the judges are not able to evaluatethe aircraft sounds as sophisticated as the provided rating scalepresumed and with this the rating task was too difficult [34]. Inthis case a more intensive rater training could be advantageous.

Despite of the small discrepancies between the aircraft soundsand the high proportion of inter-individual variance the findingsof the accomplished D-studies demonstrate that it is possible tomeasure not all but most of the sound features with reasonableexpenditure. If a reliability of 0.70 is adequate for measuring thesubjectively perceived quality features of departure and arrivalsounds less than 30 judges are necessary for most of the SD items.Nevertheless, taking into consideration the proportion of varianceexplained by the factor ‘‘aircraft sound” as well as the calculatedrelative and absolute G-coefficients it seems to be reasonable todiscount the item ‘‘safe” when measuring the perceived nature ofarrival as well as departure sounds, even if this feeling is consid-ered as mediator in field surveys [35]. Regarding the evaluationof arrival as well as departure sounds the item ‘‘booming” shouldalso be skipped because in this case the proportion of variance ex-plained by the factor ‘‘airplane sound” took a value less than 5 per-cent. For the same reason the measurement of the item ‘‘powerful”should be omitted when the analysis of arrival sounds is intended.These recommendations are valid for equalized sounds.

The determination of the measurement characteristics of the SDwas done using a random sample of non-airport residents asjudges. Such a procedure has the advantage that the results ofthe reliability studies are presumably not much affected by nega-tive preconceptions concerning air transport and aircraft noise.Nevertheless, a comparison of non- to airport residents concerningthe perceived quality of airplane sounds could be helpful in orderto ascertain whether a separation of both groups is reallynecessary.

There are several alternatives to use the SD items in studiesaiming at information about more accepted aircraft sounds. Onepossibility is to use the newly created SD in combination with ascale measuring for instance ‘‘pleasantness” [3]. Such a pleasant-ness scale allows subjects to evaluate the desired nature [8] of air-craft sounds. The perceived nature and desired nature of soundscan be linked to evaluate the sound quality of aircrafts. First hintsconcerning the aspects of an aircraft sound which are important forsound improvement can be deduced from the correlations betweenthe pleasantness ratings and the evaluations of the SD items. Fur-thermore, the SD could be used to gain information concerning the

necessary minimum extent of modifying a component of a sound(e.g. ‘‘buzz-saw” etc.) that still can be perceived by listeners. Suchstudies would be helpful for engineers since they get some evi-dence about the limit above further changes of the sound are notmeaningful since perceptual effects fail to appear. Beyond this,sound quality studies may be useful for a long-term objectivenamely the charging of airport fees taking into account not onlythe sound pressure level but also other annoyance related aspectsof aircraft sounds as e.g. sound quality. Since such measurementshave financial consequences the subjective evaluations should pos-sess the highest possible accuracy that is a reliability satisfying therequirements for precision measurements whereby the measure-ments should be feasible with adequate expenditure.

Acknowledgements

This work has been carried out within the EC-6th frameworkprogram SEFA (Sound Engineering for Aircraft, coordinated byEADS, Munich). The support of the EU is therefore gratefullyacknowledged. Special thanks to the following SEFA partnerswho contributed to the presented results with their advice and lab-oratory examinations: BUTE (Hungary), CRF (Italy), ISVR (UnitedKingdom), KTH (Sweden), LMS (Belgium), and SASS (Germany)for recording the used aircraft sounds.

References

[1] European Organization for the Safety of Air Navigation. EUROCONTROLMedium-Term Forecast: IFR Flight Movements 2007–2013. EATMPInfocentre; 2007.

[2] Guski R. Psychological methods for evaluating sound quality and assessingacoustic information. Acta Acoust 1997;83:765–74.

[3] Müller U, Schütte M. Sound engineering for Aircraft (SEFA), first results oflistening examinations. In: Inter-Noise 2006 ‘‘Engineering a quieter world”.Honolulu, Hawaii; 2006.

[4] Quehl J. Comfort studies on aircraft interior sound and vibration. UnpublishedPh.D. Thesis. Oldenburg: Carl von Ossietzky Universität; 2001.

[5] Quehl J, Schick A, Mellert V, Schulte-Fortkamp B. Evaluation of combinedaircraft interior sound and vibration effects on passengers’ well-bing andconfort sensation: The elaboration of a concept-specific methodologicalinstrument. In: Schick A, Meis M, Reckhardt C, editors. Contributions topsychological acoustics. Results of the eighth Oldenburg symposium onpsychological acoustics. Oldenburg: BIS; 2000.

[6] Schütte M, Müller U, Drobietz R. An European Project – Sound Engineering forAircraft (SEFA). In: Pikaar RN, Koningsveld EAP, Settels PJM, editors. Meetingdiversity in ergonomics. Amsterdam: Elsevier; 2006.

[7] Blauert J, Jekosch U. Sound quality evaluation: a multi layered problem. ActaAcust United Acust 1997;83:747–53.

[8] Jekosch U. Basic concepts and terms quality, reconsidered in the context ofproduct-sound quality. Acta Acust United Acust 2004;90:999–1006.

[9] Farina A, Ugolotti E. Subjective evaluation of the sound quality in cars by theauralisation technique. In: Proceedings of fourth international conference andexhibition ‘‘comfort in the automotive industry” – Bologna (Italy); October 2–3, 1997.

[10] Raimbault M, Lavandier C, Berengier M. Ambient sound assessment of urbanenvironments: field studies in two French cities. Appl Acoust2003;64:1241–56.

[11] Yong Jeon J, You J, Yeon Chang H. Sound radiation and sound qualitycharacteristics of refrigerator noise in real living environments. Appl Acoust2007;68:1118–34.

[12] Schick A. Schallwirkung aus Psychologischer Sicht. Stuttgart: Klett-Cotta;1979.

[13] Janssens K, Vecchio A, van der Auweraer H. Synthesis and sound qualityevaluation of exterior and interior aircraft noise. Aerospace Sci Technol 2007.doi:10.1016/j.ast.2007.10.00.

[14] Koch P. Lexical typology from a cognitive and linguistic point of view. In: CruseA, Hundsnurche F, Job M, Lutzeier PR, editors. Lexicology: an internationalhandbook on the nature and structure of words and vocabularies, vol.1. Berlin: Walter de Gruyter; 2002.

[15] Koch P, Blank A. Kognitive romanische onomasiologie und semasiologie. In:Blank A, Koch P, editors. Linguistische Arbeiten. Tübingen: Niemeyer; 2003.

[16] Barbot B, Lavandier C, Cheminée P. Perceptual representation of aircraftsounds. Appl Acoust 2008;69:1003–16.

[17] Brennan RL. Generalizability theory. New York: Springer; 2001.[18] Cronbach LJ, Gleser GC, Nanda H, Rajaratnam N. The dependability of

behavioral measurements: theory of generalizability for scores andprofiles. New York: John Wiley; 1972.

Page 12: Perceived quality features of aircraft sounds: An analysis of the measurement characteristics of a newly created semantic differential

914 M. Schütte et al. / Applied Acoustics 70 (2009) 903–914

[19] Cornfield J, Tukey JW. Average values of mean squares in factorials. Ann MathStatist 1956;27:907–49.

[20] Job RFS. Noise sensitivity as a factor influencing human reaction to noise. NoiseHealth 1999;3:57–68.

[21] Miedema HME, Vos H. Noise sensitivity reactions to noise and otherenvironmental conditions. J Acoust Soc Am 2003;113:1492–504.

[22] Ellermeier W, Eigenstetter M, Zimmer K. Psychoacoustic correlates ofindividual noise sensitivity. J Acoust Soc Am 2001;109:1464–73.

[23] Stansfeld SA. Noise, noise sensitivity and psychiatric disorder. Psychol Med:Monogr 1992(Suppl. 22).

[24] Schütte M, Marks A, Wenning E, Griefahn B. The development of the noisesensitivity questionnaire. Noise Health 2007;9:15–24.

[25] Sandrock S, Schütte M, Griefahn B. The reliability of the noise sensitivityquestionnaire in a cross-national analysis. Noise Health 2007;9:8–14.

[26] Diederich GW, Messick SJ, Tucker LR. A general least squares solution forsuccessive intervals. Psychometrika 1957;22:159–73.

[27] Sydow H, Petzold P. Mathematische psychologie. Berlin: Springer; 1982.

[28] Tränkle U. Auswirkungen der Gestaltung der Antwortskala auf quantitativeUrteile. Zeitschrift für Sozialpsychologie 1987;18:88–99.

[29] DIN EN ISO 10075-3. Ergonomic principles related to mental workload – Part3: Measurement and assessment of mental workload. Berlin: Beuth; 2004.

[30] Brennan RL. Raw-score conditional standard errors of measurement ingeneralizability theory. Appl Psychol Meas 1998;22:307–31.

[31] Gerard HB, Shapiro HN. Determining the degree of inconsistency in a set ofpaired comparisons. Psychometrika 1958;23:33–46.

[32] Kendall MB, Babington Smith B. On the method of paired comparisons.Biometrika 1940;31:324–45.

[33] Hill RJ. A note on inconsistency in paired comparison judgments. AmSociological Rev 1953;18:564–6.

[34] Pitrella FD, Käppler WD. Identification and evaluation of scale designprinciples in the development of the extended range, sequential judgmentscale. Wachtberg: Forschungsinstitut für Anthropotechnik; 1988.

[35] Guski R. Personal and social variables as co-determinants of noise annoyance.Noise Health 1999;1:45–56.