284
CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London New York Washington, D.C. Interdisciplinary Statistics STATISTICS in MUSICOLOGY Jan Beran ©2004 CRC Press LLC

Statistics in Musicology

Embed Size (px)

DESCRIPTION

Statistics in Musicology

Citation preview

Page 1: Statistics in Musicology

CHAPMAN & HALL/CRCA CRC Press Company

Boca Raton London New York Washington, D.C.

I n t e r d i s c i p l i n a r y S t a t i s t i c s

STATISTICS inMUSICOLOGY

Jan Beran

©2004 CRC Press LLC

Page 2: Statistics in Musicology

This book contains information obtained from authentic and highly regarded sources. Reprinted materialis quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonableefforts have been made to publish reliable data and information, but the author and the publisher cannotassume responsibility for the validity of all materials or for the consequences of their use.

Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronicor mechanical, including photocopying, microfilming, and recording, or by any information storage orretrieval system, without prior permission in writing from the publisher.

The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, forcreating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLCfor such copying.

Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.

Trademark Notice:

Product or corporate names may be trademarks or registered trademarks, and areused only for identification and explanation, without intent to infringe.

Visit the CRC Press Web site at www.crcpress.com

© 2004 by Chapman & Hall/CRC

No claim to original U.S. Government worksInternational Standard Book Number 1-58488-219-0

Library of Congress Card Number 2003048488Printed in the United States of America 1 2 3 4 5 6 7 8 9 0

Printed on acid-free paper

Library of Congress Cataloging-in-Publication Data

Beran, Jan, 1959-Statistics in musicology / Jan Beran.

p. cm. — (Interdisciplinary statistics series)Includes bibliographical references (p. ) and indexes.ISBN 1-58488-219-0 (alk. paper)1. Musical analysis—Statistical methods. I. Title. II. Interdisciplinary statistics

MT6.B344 2003781.2—dc21 2003048488

C2190 disclaimer.fm Page 1 Monday, June 9, 2003 10:51 AM

©2004 CRC Press LLC©2004 CRC Press LLC

©2004 CRC Press LLC

Page 3: Statistics in Musicology

Contents

Pre f ace

1 So m e m athe m ati cal fo undati o ns o f mus i c 1.1 General background1.2 Some elements of algebra1.3 Specific applications in music

2 E xpl o rato ry data m i ni ng i n mus i cal s pace s 2.1 Musical motivation2.2 Some descriptive statistics and plots for univariate data2.3 Specific applications in music – univariate2.4 Some descriptive statistics and plots for bivariate data2.5 Specific applications in music – bivariate2.6 Some multivariate descriptive displays2.7 Specific applications in music – multivariate

3 Gl o bal m e as ure s o f s tructure and rando m ne s s 3.1 Musical motivation3.2 Basic principles3.3 Specific applications in music

4 Time series analysis 4.1 Musical motivation4.2 Basic principles4.3 Specific applications in music

5 H i e rarchi cal m e tho ds 5.1 Musical motivation5.2 Basic principles5.3 Specific applications in music

6 M arkov chai ns and hi dde n M arkov m o de l s 6.1 Musical motivation6.2 Basic principles

©2004 CRC Press LLC©2004 CRC Press LLC

©2004 CRC Press LLC

Page 4: Statistics in Musicology

6.3 Specific applications in music

7 Circular statistics 7.1 Musical motivation7.2 Basic principles7.3 Specific applications in music

8 Pri nci pal comp onent anal ysi s 8.1 Musical motivation8.2 Basic principles8.3 Specific applications in music

9 Discriminant analysis 9.1 Musical motivation9.2 Basic principles9.3 Specific applications in music

10 Cl uster anal ysi s 10.1 Musical motivation10.2 Basic principles10.3 Specific applications in music

11 Multidimensional scaling 11.1 Musical motivation11.2 Basic principles11.3 Specific applications in music

Li st of figures

References

©2004 CRC Press LLC

Page 5: Statistics in Musicology

Preface

An essential aspect of music is structure. It is therefore not surprising that aconnection between music and mathematics was recognized long before ourtime. Perhaps best known among the ancient “quantitative musicologists”are the Pythagoreans, who found fundamental connections between musi-cal intervals and mathematical ratios. An obvious reason why mathematicscomes into play is that a musical performance results in sound waves thatcan be described by physical equations. Perhaps more interesting, however,is the intrinsic organization of these waves that distinguishes music from“ordinary noise”. Also, since music is intrinsically linked with human per-ception, emotion, and reflection as well as the human body, the scientificstudy of music goes far beyond physics. For a deeper understanding of mu-sic, a number of different sciences, such as psychology, physiology, history,physics, mathematics, statistics, computer science, semiotics, and of coursemusicology – to name only a few – need to be combined. This, togetherwith the lack of available data, prevented, until recently, a systematic de-velopment of quantitative methods in musicology. In the last few years,the situation has changed dramatically. Collection of quantitative data isno longer a serious problem, and a number of mathematical and statis-tical methods have been developed that are suitable for analyzing suchdata. Statistics is likely to play an essential role in future developmentsof musicology, mainly for the following reasons: a) statistics is concernedwith finding structure in data; b) statistical methods and structures aremathematical, and can often be carried over to various types of data –statistics is therefore an ideal interdisciplinary science that can link differ-ent scientific disciplines; and c) musical data are massive and complex –and therefore basically useless, unless suitable tools are applied to extractessential features.

This book is addressed to anybody who is curious about how one may an-alyze music in a quantitative manner. Clearly, the question of how such ananalysis may be done is very complex, and no ultimate answer can be givenhere. Instead, the book summarizes various ideas that have proven usefulin musical analysis and may provide the reader with “food for thought” orinspiration to do his or her own analysis. Specifically, the methods and ap-plications discussed here may be of interest to students and researchers inmusic, statistics, mathematics, computer science, communication, and en-

©2004 CRC Press LLC

Page 6: Statistics in Musicology

gineering. There is a large variety of statistical methods that can be appliedin music. Selected topics are discussed in this book, ranging from simpledescriptive statistics to formal modeling by parametric and nonparametricprocesses. The theoretical foundations of each method are discussed briefly,with references to more detailed literature. The emphasis is on examplesthat illustrate how to use the results in musical analysis. The methodscan be divided into two groups: general classical methods and specific newmethods developed to solve particular questions in music. Examples illus-trate on one hand how standard statistical methods can be used to obtainquantitative answers to musicological questions. On the other hand, thedevelopment of more specific methodology illustrates how one may designnew statistical models to answer specific questions. The data examples arekept simple in order to be understandable without extended musicologicalterminology. This implies many simplifications from the point of view ofmusic theory – and leaves scope for more sophisticated analysis that maybe carried out in future research. Perhaps this book will inspire the readerto join the effort.Chapters are essentially independent to allow selective reading. Since

the book describes a large variety of statistical methods in a nutshell itcan be used as a quick reference for applied statistics, with examples frommusicology.I would like to thank the following libraries, institutes, and museums for

their permission to print various pictures, manuscripts, facsimiles, and pho-tographs: Zentralbibliothek Zurich (Ruth Hausler, Handschriftenabteilung;Aniko Ladanyi and Michael Kotrba, Graphische Sammlung); Belmont Mu-sic Publishers (Anne Wirth); Philippe Gontier, Paris; Osterreichische PostAG; Deutsche Post AG; Elisabeth von Janoza-Bzowski, Dusseldorf; Univer-sity Library Heidelberg; Galerie Neuer Meister, Dresden; Robert-Sterl-Haus(K.M. Mieth); Bela Bartok Memorial House (Janos Sziranyi); Frank Mar-tin Society (Maria Martin); Karadar-Bertoldi Ensemble (Prof. FrancescoBertoldi); col legno (Wulf Weinmann). Thanks also to B. Repp for provid-ing us with the tempo data for Schumann’s Traumerei. I would also like tothank numerous colleagues from mathematics, statistics, and musicologywho encouraged me to write this book. Finally, I would like to thank mywife and my daughter for their encouragement and support, without whichthis book could not have been written.

Jan BeranKonstanz, March 2003

©2004 CRC Press LLC

Page 7: Statistics in Musicology

CHAPTER 1

Some mathematical foundations ofmu s i c

1. 1 General background

The study of music by means of mathematics goes back several thousandyears. Well documented are, for instance, mathematical and philosophi-cal studies by the Pythagorean school in ancient Greece (see e.g. van derWaerden 1979). Advances in mathematics, computer science, psychology,semiotics, and related fields, together with technological progress (in par-ticular computer technology) lead to a revival of quantitative thinking inmusic in the last two to three decades (see e.g. Archibald 1972, Solomon1973, Schnitzler 1976, Balzano 1980, Gotze and Wille 1985, Lewin 1987,Mazzola 1990a, 2002, Vuza 1991, 1992a,b, 1993, Keil 1991, Lendvai 1993,Lindley and Turner-Smith 1993, Genevois and Orlarey 1997, Johnson 1997;also see Hofstadter 1999, Andreatta et al. 2001, Leyton 2001, and Babbitt1960, 1961, 1987, Forte 1964, 1973, 1989, Rahn 1980, Morris 1987, 1995,Andreatta 1997; for early accounts of mathematical analysis of music alsosee Graeser 1924, Perle 1955, Norden 1964). Many recent references can befound in specialized journals such as Computing in Musicology, Music The-ory Online, Perspectives of New Music, Journal of New Music Research,Integral, Music Perception, and Music Theory Spectrum, to name a few.Music is, to a large extent, the result of a subconscious intuitive “pro-

cess”. The basic question of quantitative musical analysis is in how far musicmay nevertheless be described or explained partially in a quantitative man-ner. The German philosopher and mathematician Leibniz (1646-1716) (Fig-ure 1.5) called music the “arithmetic of the soul”. This is a profound philo-sophical statement; however, the difficulty is to formulate what exactly itmay mean. Some composers, notably in the 20th century, consciously usedmathematical elements in their compositions. Typical examples are permu-tations, the golden section, transformations in two or higher-dimensionalspaces, random numbers, and fractals (see e.g. Schonberg, Webern, Bartok,Xenakis, Cage, Lutoslawsky, Eimert, Kagel, Stockhausen, Boulez, Ligeti,Barlow; Figures 1.1, 1.4, 1.15). More generally, conscious “logical” con-struction is an inherent part of composition. For instance, the forms ofsonata and symphony were developed based on reflections about well bal-anced proportions. The tormenting search for “logical perfection” is well

©2004 CRC Press LLC

Page 8: Statistics in Musicology

Figure 1.1 Quantitative analysis of music helps to understand creative processes.(Pierre Boulez, photograph courtesy of Philippe Gontier, Paris; and “Jim” byJ.B.)

Figure 1.2 J.S. Bach (1685-1750). (Engraving by L. Sichling after a painting byElias Gottlob Haussmann, 1746; courtesy of Zentralbibliothek Zurich.)

©2004 CRC Press LLC

Page 9: Statistics in Musicology

documented in Beethoven’s famous sketchbooks. Similarily, the art of coun-terpoint that culminated in J.S. Bach’s (Figure 1.2) work relies to a highdegree on intrinsically mathematical principles. A rather peculiar early ac-count of explicit applications of mathematics is the use of permutations inchange ringing in English churches since the 10th century (Fletcher 1956,Price 1969, Stewart 1992, White 1983, 1985, 1987, Wilson 1965). Morestandard are simple symmetries, such as retrograde (e.g. Crab fugue, orCanon cancricans), inversion, arpeggio, or augmentation. A curious ex-ample of this sort is Mozart’s “Spiegel Duett” (or mirror duett, Figures1.6, 1.7 ; the attibution to Mozart is actually uncertain). In the 20th cen-tury, composers such as Messiaen or Xenakis (Xenakis 1971; figure 1.15)attempted to develop mathematical theories that would lead to new tech-niques of composition. From a strictly mathematical point of view, theirderivations are not always exact. Nevertheless, their artistic contributionswere very innovative and inspiring. More recent, mathematically stringentapproaches to music theory, or certain aspects of it, are based on mod-ern tools of abstract mathematics, such as algebra, algebraic geometry,and mathematical statistics (see e.g. Reiner 1985, Mazzola 1985, 1990a,2002, Lewin 1987, Fripertinger 1991, 1999, 2001, Beran and Mazzola 1992,1999a,b, 2000, Read 1997, Fleischer et al. 2000, Fleischer 2003).The most obvious connection between music and mathematics is due to

the fact that music is communicated in form of sound waves. Musical soundscan therefore be studied by means of physical equations. Already in ancientGreece (around the 5th century BC), Pythagoreans found the relationshipbetween certain musical intervals and numeric proportions, and calculatedintervals of selected scales. These results were probably obtained by study-ing the vibration of strings. Similar studies were done in other cultures, butare mostly not well documented. In practical terms, these studies lead tosingling out specific frequencies (or frequency proportions) as “musicallyuseful” and to the development of various scales and harmonic systems.A more systematic approach to physics of musical sounds, music percep-tion, and acoustics was initiated in the second half of the 19th century bypath-breaking contributions by Helmholz (1863) and other physicists (seee.g. Rayleigh 1896). Since then, a vast amount of knowledge has been ac-cumulated in this field (see e.g. Backus 1969, 1977, Morse and Ingard 1968,1986, Benade 1976, 1990, Rigden 1977, Yost 1977, Hall 1980, Berg andStork 1995, Pierce 1983, Cremer 1984, Rossing 1984, 1990, 2000, Johnston1989, Fletcher and Rossing 1991, Graff 1975, 1991, Roederer 1995, Rossinget al. 1995, Howard and Angus 1996, Beament 1997, Crocker 1998, Ned-erveen 1998, Orbach 1999, Kinsler et al. 2000, Raichel 2000). For a historicaccount on musical acoustics see e.g. Bailhache (2001).It may appear at first that once we mastered modeling musical sounds

by physical equations, music is understood. This is, however, not so. Musicis not just an arbitrary collection of sounds – music is “organized sound”.

©2004 CRC Press LLC

Page 10: Statistics in Musicology

Figure 1.3 Ludwig van Beethoven (1770-1827). (Drawing by E. Durck after apainting by J.K. Stieler, 1819; courtesy of Zentralbibliothek Zurich.)

Figure 1.4 Anton Webern (1883-1945). (Courtesy of Osterreichische Post AG.)

©2004 CRC Press LLC

Page 11: Statistics in Musicology

Figure 1.5 Gottfried Wilhelm Leibniz (1646-1716). (Courtesy of Deutsche PostAG and Elisabeth von Janota-Bzowski.)

Physical equations for sound waves only describe the propagation of airpressure. They do not provide, by themselves, an understanding of howand why certain sounds are connected, nor do they tell us anything (atleast not directly) about the effect on the audience. As far as structure isconcerned, one may even argue – for the sake of argument – that music doesnot necessarily need “physical realization” in form of a sound. Musiciansare able to hear music just by looking at a score. Beethoven (Figures 1.3,1.16) composed his ultimate masterpieces after he lost his hearing. Thus,on an abstract level, music can be considered as an organized structurethat follows certain laws. This structure may or may not express feelingsof the composer. Usually, the structure is communicated to the audienceby means of physical sounds – which in turn trigger an emotional expe-rience of the audience (not necessarily identical with the one intended bythe composer). The structure itself can be analyzed, at least partially, us-ing suitable mathematical structures. Note, however, that understandingthe mathematical structure does not necessarily tell us anything about theeffect on the audience. Moreover, any mathematical structure used for ana-lyzing music describes certain selected aspects only. For instance, studyingsymmetries of motifs in a composition by purely algebraic means ignorespsychological, historical, perceptual, and other important issues. Ideally, allrelevant scientific disciplines would need to interact to gain a broad under-standing. A further complication is that the existence of a unique “truth”is by no means certain (and is in fact rather unlikely). For instance, acomposition may contain certain structures that are important for somelisteners but are ignored by others. This problem became apparent in theearly 20th century with the introduction of 12-tone music. The generalpublic was not ready to perceive the complex structures of dodecaphonicmusic and was rather appalled by the seemingly chaotic noise, whereas aminority of “specialized” listeners was enthusiastic. Another example is the

©2004 CRC Press LLC

Page 12: Statistics in Musicology

comparison of performances. Which pianist is the best? This question hasno unique answer, if any. There is no fixed gold standard and no uniquesolution that would represent the ultimate unchangeable truth. What onemay hope for at most is a classification into types of performances thatare characterized by certain quantifiable properties – without attaching asubjective judgment of “quality”.The main focus of this book is statistics. Statistics is essential for con-

necting theoretical mathematical concepts with observed “reality”, to findand explore structures empirically and to develop models that can be ap-plied and tested in practice. Until recently, traditional musical analysiswas mostly carried out in a purely qualitative, and at least partially sub-jective, manner. Applications of statistical methods to questions in musicol-ogy and performance research are very rare (for examples see Yaglom andYaglom 1967, Repp 1992, de la Motte-Haber 1996, Steinberg 1995, Waugh1996, Nettheim 1997, Widmer 2001, Stamatatos and Widmer 2002) andmostly consist of simple applications of standard statistical tools to con-firm results or conjectures that had been known or “derived” before bymusicological, historic, or psychological reasoning. An interesting overviewof statistical applications in music, and many references, can be found inNettheim (1997). The lack of quantitative analysis may be explained, inpart, by the impossibility of collecting “objective” data. Meanwhile, how-ever, due to modern computer technology, an increasing number of musicaldata are becoming available. An in-depth statistical analysis of music istherefore no longer unrealistic. On the theoretical side, the developmentof sophisticated mathematical tools such as algebra, algebraic geometry,mathematical statistics, and their adaptation to the specific needs of mu-sic theory, made it possible to pursue a more quantitative path. Becauseof the complex, highly organized nature of music, existing, mostly qual-itative, knowledge about music must be incorporated into the process ofmathematical and statistical modeling. The statistical methods that willbe discussed in the subsequent chapters can be divided into two categories:

1. Classical methods of mathematical statistics and exploratory data anal-ysis: many classical methods can be applied to analyze musical struc-tures, provided that suitable data are available. A number of exampleswill be discussed. The examples are relatively simple from the point ofview of musicology, the purpose being to illustrate how the appropriateuse of statistics can yield interesting results, and to stimulate the readerto invent his or her own statistical methods that are appropriate foranswering specific musicological questions.

2. New methods developed specifically to answer concrete questions in mu-sicology: in the last few years, questions in music composition and per-formance lead to the development of new statistical methods that arespecifically designed to solve questions such as classification of perfor-

©2004 CRC Press LLC

Page 13: Statistics in Musicology

mance styles, identification and modeling of metric, melodic, and har-monic structures, quantification of similarities and differences betweencompositions and performance styles, automatic identification of musi-cal events and structures from audio signals, etc. Some of these methodswill be discussed in detail.

A mathematical discipline that is concerned specifically with abstract defi-nitions of structures is algebra. Some elements of basic algebra are thereforediscussed in the next section. Naturally, depending on the context, othermathematical disciplines also play an equally important role in musicalanalysis, and will be discussed later where necessary. Readers who are fa-miliar with modern algebra may skip the following section. A few examplesthat illustrate applications of algebraic structures to music are presentedin Section 1.3. An extended account of mathematical approaches to musicbased on algebra and algebraic geometry is given, for instance, in Mazzola(1990a, 2002) (also see Lewin 1987 and Benson 1995-2002).

1.2 Some elements of algebra

1.2.1 Motivation

Algebraic considerations in music theory have gained increasing popularityin recent years. The reason is that there are striking similarities betweenmusical and algebraic structures. Why this is so can be illustrated by a sim-ple example: notes (or rather pitches) that differ by an octave can be con-sidered equivalent with respect to their harmonic “meaning”. If an instru-ment is tuned according to equal temperament, then, from the harmonicperspective, there are only 12 different notes. These can be represented asintegers modulo 12. Similarily, there are only 12 different intervals. Thismeans that we are dealing with the set Z12 = {0, 1, ..., 11}. The sum of twoelements x, y ∈ Z12, z = x+ y is interpreted as the note/interval resultingfrom “increasing” the note/interval x by the interval y. The set Z12 of notes(intervals) is then an additive group (see definition below).

1.2.2 Definitions and results

We discuss some important concepts of algebra that are useful to describemusical structures. A more comprehensive overview of modern algebra canbe found in standard text books such as those by Albert (1956), Herstein(1975), Zassenhaus (1999), Gilbert (2002), and Rotman (2002).The most fundamental structures in algebra are group, ring, field, mod-

ule, and vector space.Definition 1 Let G be a nonempty set with a binary operation + such thata+ b ∈ G for all a, b ∈ G and the following holds:1. (a+ b) + c = a+ (b + c) (Associativity)

©2004 CRC Press LLC

Page 14: Statistics in Musicology

2. There exists a zero element 0 ∈ G such that 0 + a = a + 0 = a for alla ∈ G

3. For each a ∈ G, there exists an inverse element (−a) ∈ G such that(−a) + a = a+ (−a) = 0

Then (G,+) is called a group. The group (G,+) is called commutative (orabelian), if for each a, b ∈ G, a+ b = b + a. The number of elements in Gis called order of the group and is denoted by o(G). If the order is finite,then G is called a finite group.In a multiplicative way this can be written asDefinition 2 Let G be a nonempty set with a binary operation · such thata · b ∈ G for all a, b ∈ G and the following holds:1. (a · b) · c = a · (b · c) (Associativity)2. There exists an identity element e ∈ G such that e · a = a · e = a for all

a ∈ G

3. For each a ∈ G, there exists an inverse element a−1 ∈ G such thata−1 · a = a · a−1 = e

Then (G, ·) is called a group. The group (G, ·) is called commutative (orabelian), if for each a, b ∈ G, a · b = b · a.For subsets we haveDefinition 3 Let (G, ·) and (H, ·) be groups and H ⊂ G. Then H is calledsubgroup of G.Some groups can be generated by a single element of the group:Definition 4 Let (G, ·) be a group with n < ∞ elements denoted by ai

(i = 0, 1, ..., n− 1) and such that1. ao = an = e

2. aiaj = ai+j if i+ j ≤ n and aiaj = ai+j−n if i+ j > n

Then G is called a cyclic group. Furthermore, if G = (a) = {ai : i ∈ Z}where ai denotes the product with all i terms equal to a, then a is called agenerator of G.An important notion is given in the followingDefinition 5 Let G be a group that “acts” on a set X by assigning to eachx ∈ X and g ∈ G an element g(x) ∈ X. Then, for each x ∈ X, the setG(x) = {y : y = g(x), g ∈ G} is called orbit of x.Note that, given a group G that acts on X, the set X is partitioned intodisjoint orbits.If there are two operations + and ·, then a ring is defined by

Definition 6 Let R be a nonempty set with two binary operations + and· such that the following holds:1. (R,+) is an abelian group

©2004 CRC Press LLC

Page 15: Statistics in Musicology

2. a · b ∈ R for all a, b ∈ R

3. (a · b) · c = a · (b · c) (Associativity)4. a · (b + c) = a · b+ a · c and (b + c) · a = b · a+ c · a (distributive law)

Then (R,+, ·) is called an (associative) ring. If also a · b = b · a for alla, b ∈ R, then R is called a commutative ring.

Further useful definitions are:

Definition 7 Let R be a commutative ring and a ∈ R, a = 0 such thatthere exists an element b ∈ R, b = 0 with a · b = 0. Then a is called azero-divisor. If R has no zero-divisors, then it is called an integral domain.

Definition 8 Let R be a ring such that (R \ {0}, ·) is a group. Then R iscalled a division ring. A commutative division ring is called a field.

A module is defined as follows:

Definition 9 Let (R,+, ·) be a ring and M a nonempty set with a binaryoperation +. Assume that

1. (M,+) is an abelian group

2. For every r ∈ R, m ∈M , there exists an element r ·m ∈ M

3. r · (a+ b) = r · a+ r · b for every r ∈ R, m ∈M

4. r · (s · b) = (r · s) · a for every r, s ∈ R, m ∈M

5. (r + s) · a = r · a+ s · a for every r, s ∈ R, m ∈M

Then M is called an R−module or module over R. If R has a unit elemente and if e · a = a for all a ∈ M , then M is called a unital R−module. A aunital R−module where R is a field is called a vector space over R.

There is an enormous amount of literature on groups, rings, modules,etc. Some of the standard results are summarized, for instance, in textbooks such as those given above. Here, we cite only a few theorems thatare especially useful in music. We start with a few more definitions.

Definition 10 Let H ⊂ G be a subgroup of G such that for every a ∈ G,a ·H · a−1 ∈ H. Then H is called a normal subgroup of G.

Definition 11 Let G be such that the only normal subgroups are H = Gand H = {e}. Then G is called a simple group.

Definition 12 Let G be a group and H1, ..., Hn normal subgroups suchthat

G = H1 ·H2 · · ·Hn (1.1)

and any a ∈ G can be written uniquely as a product

a = b1 · b2 · · · bn (1.2)

with bi ∈ Hi. Then G is said to be the (internal) direct product of H1, ..., Hn.

©2004 CRC Press LLC

Page 16: Statistics in Musicology

Definition 13 Let G1 and G2 be two groups, define G = G1 × G2 ={(a, b) : a ∈ G1, b ∈ G2} and the operation · by (a1, b1) · (a2, b2) = (a1 ·a2, b1 · b2). Then the group (G, ·) is called the (external) direct product ofG1 and G2.

Definition 14 Let M be an R−module and M1, ...,Mn submodules suchthat every a ∈ M can be written uniquely as a sum

a = a1 + a2 + ...+ an (1.3)

with ai ∈Mi. Then M is said to be the direct sum of M1, ...,Mn.

We now turn to the question which subgroups of finite groups exist.Theorem 1 Let H be a subgroup of a finite group G. Then o(H) is adivisor of o(G).

Theorem 2 (Sylow) Let G be a group and p a prime number such that pm

is a divisor of o(G). Then G has a subgroup H with o(H) = pm.Definition 15 A subgroup H ⊂ G such that pm is a divisor of o(G) butpm+1 is not a divisor, is called a p−Sylow subgroup.The next theorems help to decide whether a ring is a field.

Theorem 3 Let R be a finite integral domain. Then R is a field.

Corollary 1 Let p be a prime number and R = Zp = {x mod p : x ∈ N}be the set of integers modulo p (with the operations m + and · definedaccordingly). Then R is a field.

An essential way to compare algebraic structures is in terms of operation-preserving mappings. The following definitions are needed:Definition 16 Let (G1, ·) and (G2, ·) be two groups. A mapping g : G1 →G2 such that

g(a · b) = g(a) · g(b) (1.4)is called a (group-)homomorphism. If g is a one-to-one (group-)homomorph-ism, then it is called an isomorphism (or group-isomorphism). Moreover,if G1 = G2, then g is called an automorphism (or group-automorphism).

Definition 17 Two groups G1, G2 are called isomorphic, if there is anisomorphism g : G1 → G2.

Analogous definitions can be given for rings and modules:Definition 18 Let R1 and R2 be two rings. A mapping g : G1 → G2 suchthat

g(a+ b) = g(a) + g(b) (1.5)and

g(a · b) = g(a) · g(b) (1.6)is called a (ring-)homomorphism. If g is a one-to-one (ring-)homomorphism,then it is called an isomorphism (or ring-isomorphism). Furthermore, ifR1 = R2, then g is called an automorphism (or ring-automorphism).

©2004 CRC Press LLC

Page 17: Statistics in Musicology

Definition 19 Two rings R1, R2 are called isomorphic, if there is an iso-morphism g : R1 → R2.Definition 20 Let M1 and M2 be two modules over R. A mapping g :M1 →M2 such that for every a, b ∈ M1, r ∈ R,

g(a+ b) = g(a) + g(b) (1.7)

andg(r · a) = r · g(a) (1.8)

is called a (module-)homomorphism (or a linear transformation). If g isa one-to-one (module-)homomorphism, then it is called an isomorphism(or module-isomorphism). Furthermore, if G1 = G2, then g is called anautomorphism (or module-automorphism).Definition 21 Two modules M1,M2 are called isomorphic, if there is anisomorphism g :M1 →M2.Finally, a general family of transformations is defined byDefinition 22 Let g : M1 → M2 be a (module-)homomorphism. Then amapping h :M1 →M2 defined by

h(a) = c+ g(a) (1.9)

with c ∈M2 is called an affine transformation. IfM1 =M2, then g is calleda symmetry of M . Moreover, if g is invertible, then it is called an invertiblesymmetry of M .Studying properties of groups is equivalent to studying groups of auto-

morphisms:Theorem 4 (Cayley’s theorem) Let G be a group. Then there is a set Ssuch that G is isomorphic to a subgroup of A(S) where A(S) is the set ofall one-to-one mappings of S onto itself.Definition 23 Let G be a finite group. Then the group (A(S), ◦) (wherea ◦ b denotes successive application of the functions a and b) is called thesymmetric group of order n, and is denoted by Sn.Note that Sn is isomorphic to the group of permutations of the numbers1, 2, ..., n, and has n! elements. Another important concept is motivated byrepresentation in coordinates as we are used to from euclidian geometry.The representation follows since, in terms of isomorphy, the inner and outerproduct can be shown to be equivalent:Theorem 5 Let G = H1 · H2 · · · Hn be the internal direct product ofH1, ..., Hn and G∗ = H1 ×H2 × ...×Hn the external direct product. ThenG and G∗ are isomorphic, through the isomorphism g : G∗ → G defined byg(a1, ..., an) = a1 · a2 · ... · an.This theorem implies that one does not need to distinguish between theinternal and external direct product. The analogous result holds for mod-ules:

©2004 CRC Press LLC

Page 18: Statistics in Musicology

Theorem 6 Let M be a direct sum of M1, ...,Mn. Then M is isomor-phic to the module M∗ = {(a1, a2, ..., an) : ai ∈ Mi} with the opera-tions (a1, a2, ...) + (b1, b2, ...) = (a1 + b1, a2 + b2, ...) and r · (a1, a2, ...) =(r · a1, r · a2, ...).Thus, a module M = M1 +M2 + ... +Mn can be described in terms ofits coordinates with respect to Mi (i = 1, ..., n) and the structure of M isknown as soon as we know the structure of Mi (i = 1, ..., n).Direct products can be used, in particular, to characterize the structure

of finite abelian groups:Theorem 7 Let (G, ·) be a finite commutative group. Then G is isomor-phic to the direct product of its Sylow-subgroups.

Theorem 8 Let (G, ·) be a finite commutative group. Then G is the directproduct of cyclic groups.

Similar, but slightly more involved, results can be shown for modules, butwill not be needed here.

1.3 Specific applications in music

In the following, the usefulness of algebraic structures in music is illus-trated by a few selected examples. This is only a small selection fromthe extended literature on this topic. For further reading see e.g. Graeser(1924), Schonberg (1950), Perle (1955), Fletcher (1956), Babbitt (1960,1961), Price (1969), Archibald (1972), Halsey and Hewitt (1978), Balzano(1980), Rahn (1980), Gotze and Wille (1985), Reiner (1985), Berry (1987),Mazzola (1990a, 2002 and references therein), Vuza (1991, 1992a,b, 1993),Fripertinger (1991), Lendvai (1993), Benson (1995-2002), Read (1997), Noll(1997), Andreatta (1997), Stange-Elbe (2000), among others.

1.3.1 The Mathieu group

It can be shown that finite simple groups fall into families that can bedescribed explicitly, except for 26 so-called sporadic groups. One such groupis the so-called Mathieu group M12 which was discovered by the Frenchmathematician Mathieu in the 19th century (Mathieu 1861, 1873, also seee.g. Conway and Sloane 1988). In their study of probabilistic properties of(card) shuffling, Diaconis et al. (1983) show that M12 can be generated bytwo permutations (which they call Mongean shuffles), namely

π1 =(

1 2 3 4 5 6 7 8 9 10 11 127 6 8 5 9 4 10 3 11 2 12 1

)(1.10)

and

π2 =(

1 2 3 4 5 6 7 8 9 10 11 126 7 5 8 4 9 3 10 2 11 1 12

)(1.11)

©2004 CRC Press LLC

Page 19: Statistics in Musicology

where the low rows denote the image of the numbers 1, ..., 12. The orderof this group is o(M12) = 95040 (!) An interesting application of thesepermutations can be found in Ile de feu 2 by Olivier Messiaen (Berry 1987)where π1 and π2 are used to generate sequences of tones and durations.

1.3.2 Campanology

A rather peculiar example of group theory “in action” (though perhapsrather trivial mathematically) is campanology or change ringing (Fletcher1956, Wilson 1965, Price 1969, White 1983, 1985, 1987, Stewart 1992). Theart of change ringing started in England in the 10th century and is stillperformed today. The problem that is to be solved is as follows: there arek swinging bells in the church tower. One starts playing a melody thatconsists of a certain sequence in which the bells are played, each bell be-ing played only once. Thus, the initial sequence is a permutation of thenumbers 1, ..., k. Since it is not interesting to repeat the same melody overand over, the initial melody has to be varied. However, the bells are veryheavy so that it is not easy to change the timing of the bells. Each variationis therefore restricted, in that in each “round” only one pair of adjacentbells can exchange their position. Thus, for instance, if k = 4 and the pre-vious sequence was (1, 2, 3, 4), then the only permissible permutations are(2, 1, 3, 4), (1, 3, 2, 4), and (1, 2, 4, 3). A further, mainly aesthetic restictionis that no sequence should be repeated except that the last one is iden-tical with the initial sequence. A typical solution to this problem is, forinstance, the “Plain Bob” that starts by (1, 2, 3, 4), (2, 1, 4, 3), (2, 4, 1, 3),...and continues until all permutations in S4 are visited.

1.3.3 Representation of music

Many aspects of music can be “embedded” in a suitable algebraic module(see e.g. Mazzola 1990a). Here are some examples:1. Apart from glissando effects, the essential frequencies in most types of

music are of the form

ω = ωo

K∏i=1

pxi

i (1.12)

where K < ∞, ωo is a fixed basic frequency, pi are certain fixed primenumbers and xi ∈ Q. Thus,

ψ = logω = ψo +K∑i=1

xiψi (1.13)

where ψo = logωo, ψi = log pi (i ≥ 1). Let Ψ = {ψ : ψ =∑K

i=1 xiψi, xi ∈Q} be the set of all log-frequencies generated this way. Then Ψ is amodule over Q. Two typical examples are:

©2004 CRC Press LLC

Page 20: Statistics in Musicology

(a) ωo = 440 Hz, K = 3, ω1 = 2, ω2 = 3, ω3 = 5 : This is the so-calledEuler module in which most Western music operates. An importantsubset consists of frequencies of the just intonation with the pure in-tervals octave (ratio of frequencies 2), fifth (ratio of frequencies=3/2)and major third (ratio of frequencies 5/4):

ψ = logω = log 440 + x1 log 2 + x2 log 3 + x3 log 5 (1.14)

(xi ∈ Z). The notes (frequencies) ψ can then be represented by pointsin a three-dimensional space of integers Z3. Note that, using the nota-tion a = (a1, a2, a3) and b = (b1, b2, b3), the pitch obtained by additionc = a+ b corresponds to the frequency ωo2a1+b13a2+b25a3+b3 .

(b) ωo = 440 Hz, K = 1, ω1 = 2, and x = p12 , where p ∈ Z : This

corresponds to the well-tempered tuning where an octave is dividedinto equal intervals. Thus, the ratio 2 is decomposed into 12 ratios12√2 so that

ψ = log 440 +p

12log 2 (1.15)

If notes that differ by one or several octaves are considered equiva-lent, then we can identify the set of notes with the Z−module Z12 ={0, 1, ..., 11}.

2. Consider a finite module of notes (frequencies), such as for instance thewell-tempered module M = Z12. Then a scale is an element of S ={(x1, ..., xk) : k ≤ |M |, xi ∈ M,xi = xj (i = j)}, the set of all finitevectors with different components.

1.3.4 Classification of circular chords and other musical objects

A central element of classical theory of harmony is the triad. An alge-braic property that distinguishes harmonically important triads from otherchords can be described as follows: let x1, x2, x3 ∈ Z12, such that (a) xi =xj(i =j) and (b) there is an “inner” symmetry g : Z12 → Z12 such that{y : y = gk(x1), k ∈ N} = {x1, x2, x3}. It can be shown that all chords(x1, x2, x3) for which (a) and (b) hold are standard chords that are har-monically important in traditional theory of harmony. Consider for instancethe major triad (c, e, g) = (0, 4, 7) and the minor triad (c, e#, g) = (0, 3, 7).For the first triad, the symmetry g(x) = 3x + 7 yields the desired result:g(0) = 7 = g, g(7) = 4 = e and g(4) = 7 = g. For the minor triad theonly inner symmetry is g(x) = 3x + 3 with g(7) = 0 = c, g(0) = 3 = e#and g(3) = 0 = c. This type of classification of chords can be carried overto more complicated configurations of notes (see e.g. Mazzola 1990a, 2002,Straub 1989). In particular, musical scales can be classified by comparingtheir inner symmetries.

©2004 CRC Press LLC

Page 21: Statistics in Musicology

1.3.5 Torus of thirds

Consider the group G = (Z12,+) of pitches modulo octave. Then G isisomorphic to the direct sum of the Sylow groups Z3 and Z4 by applyingthe isomorphism

g : Z12 → Z3 + Z4, (1.16)x → y = (y1, y2) = (xmod 3,−xmod4) (1.17)

Geometrically, the elements of Z3 + Z4 can be represented as points ona torus, y1 representing the position on the vertical meridian and y2 theposition on the horizontal equatorial circle (Figure 1.8). This representationhas a musical meaning: a movement along a meridian corresponds to amajor third, whereas a movement along a horizontal circle corresponds toa minor third. One then can define the “torus-distance” dtorus(x, y) byequating it to the minimal number of steps needed to move from x to y.The value of dtorus(x, y) expresses in how far there is a third-relationshipbetween x and y. The possible values of dtorus are 0 (if x = y), 1, 2, and3 (smallest third-relationship). Note that dtorus can be decomposed intod3 + d4 where d3 counts the number of meridian steps and d4 the numberof equatorial steps.

1.3.6 Transformations

For suitably chosen integers p1, p2, p3, p4, consider the four-dimensionalmodule M = Zp1 × Zp2 × Zp3 × Zp4 over Z where the coordinates rep-resent onset time, pitch (well-tempered tuning if p2 = 12), duration, andvolume. Transformations in this space play an essential role in music. A se-lection of historically relevant transformations used by classical composersis summarized in Table 1.1 (also see Figure 1.13).Generally, one may say that affine transformations are most important,

and among these the invertible ones. In particular, it can be shown that eachsymmetry of Z12 can be written as a product (in the group of symmetriesSymm(Z12)) of the following musically meaningful transformations:• Multiplication by −1 (inversion);• Multiplication by 5 (ordering of notes according to circle of quarts);• Addition of 3 (transposition by a minor third);• Addition of 4 (transposition by a major third).All these transformations have been used by composers for many centuries.Some examples of apparent similarities between groups of notes (or motifs)are shown in Figures 1.10 through 1.12. In order not to clutter the pic-tures, only a small selection of similar motifs is marked. In dodecaphonicand serial music, transformation groups have been applied systematically(see e.g. Figure 1.9). For instance, in Schoberg’s Orchestervariationen op.

©2004 CRC Press LLC

Page 22: Statistics in Musicology

Table 1.1 Some affine transformations used in classical music

Function Musical meaning

Shift: f(x) = x + a Transposition, repetition,change of duration,change of loudness

Shear, e.g. of x = (x 1 , ..., x 4)t Arpeggio

w.r.t. line y = βo + t · (0, 1, 0, 0):f(x) = x + a · (0, 1, 0, 0)for x not on line,f(x) = x for x on line

Reflection, e.g. w.r.t. Retrograde, inversionv = (a, 0, 0, 0):f(x) = (a − (x 1 − a), x2 , x3 , x4)

Dilatation, e.g. w.r.t. pitch: Augmentationf(x) = (x 1 , a · x 2 , x3 , x4)

Exchange of coordinates: Exchange of “parameters”f(x) = (x 2 , x1 , x3 , x4) (20th century)

31, the full orbit generated by inversion, retrograde and transposition isused. Webern used 12-tone series that are diagonally symmetric in thetwo-dimensional space spanned by pitch and onset time. Other famous ex-amples include Eimert’s rotation by 45 degrees together with a dilatationby

√2 (Eimert 1964) and serial compositions such as Boulez’s “Structures”

and Stockhausen’s “Kontra-Punkte”. With advanced computer technol-ogy (e.g. composition soft- and hardware such as Xenaki’s UPIC graph-ics/computer system or the recently developed Presto software by Mazzola1989/1994), the application of affine transformations in musical spaces ofarbitrary dimension is no longer the tedious work of the early dodecaphonicera. On the contrary, the practical ease and enormous artistic flexibilitylead to an increasing popularity of computer aided transformations amongcontemporary composers (see e.g. Iannis Xenakis, Kurt Dahlke, WilfriedJentzsch, Guerino Mazzola 1990b, Dieter Salbert, Karl-Heinz Schoppner,Tamas Ungvary, Jan Beran 1987, 1991, 1992, 2000; cf. Figure 1.14).

©2004 CRC Press LLC

Page 23: Statistics in Musicology

Spiegel-DuettAllegro q=120

(W.A. Mozart)

Violin mf

7

Vln. 12

Vln. 18

Vln. 22

Vln.

27

Vln. 32

Vln.

36

Vln.

41

Vln.

46

Vln. 51

Vln.

57

Vln. 60

Vln.

Figure 1.6 W.A. Mozart (1759-1791) (authorship uncertain) – Spiegel-Duett.

©2004 CRC Press LLC

Page 24: Statistics in Musicology

Figure 1.7 Wolfgang Amadeus Mozart (1756-1791). (Engraving by F. Muller af-ter a painting by J.W. Schmidt; courtesy of Zentralbibliothek Zurich.)

Figure 1.8 The torus of thirds Z3 + Z4.

©2004 CRC Press LLC

Page 25: Statistics in Musicology

Figure 1.9 Arnold Schonberg – Sketch for the piano concert op. 42 – notes withtone row and its inversions and transpositions. (Used by permission of BelmontMusic Publishers.)

Figure 1.10 Notes of “Air” by Henry Purcell. (For better visibility, only a smallselection of related “motifs” is marked.)

©2004 CRC Press LLC

Page 26: Statistics in Musicology

Figure 1.11 Notes of Fugue No. 1 (first half) from “Das WohltemperierteKlavier” by J.S. Bach. (For better visibility, only a small selection of related“motifs” is marked.)

Figure 1.12 Notes of op. 68, No. 2 from “Album fur die Jugend” by Robert Schu-mann. (For better visibility, only a small selection of related “motifs” is marked.)

©2004 CRC Press LLC

Page 27: Statistics in Musicology

Figure 1.13 A miraculous transformation caused by high exposure to Wagneroperas. (Caricature from a 19th century newspaper; courtesy of ZentralbibliothekZurich.)

Figure 1.14 Graphical representation of pitch and onset time in Z271 together with

instrumentation of polygonal areas. (Excerpt from Santi – Piano concert No. 2by Jan Beran, col legno CD 20062; courtesy of col legno, Germany.)

©2004 CRC Press LLC

Page 28: Statistics in Musicology

Figure 1.15 Iannis Xenakis (1922-1998). (Courtesy of Philippe Gontier, Paris.)

Figure 1.16 Ludwig van Beethoven (1770-1827). (Courtesy of ZentralbibliothekZurich.)

©2004 CRC Press LLC

Page 29: Statistics in Musicology

CHAPTER 2

Exploratory data mining in musicalspaces

2 . 1 M us i cal m o ti vati o n

The primary aim of descriptive statistics is to summarize data by a smallset of numbers or graphical displays, with the purpose of finding typicalrelevant features. An in-depth descriptive analysis explores the data as faras possible in the hope of finding anything interesting. This activity istherefore also called “exploratory data analysis” (EDA; see Tukey 1977),or “data mining”. EDA does not require a priori model assumptions – thepurpose is simply free exploration. Many exploratory tools are, however,inspired by probabilistic models and designed to detect features that maybe captured by these.Descriptive or exploratory analysis is of special interest in music. The

reason is that in music very subtle local changes play an important role.For instance, a good pianist may achieve a desired emotional effect by slightlocal variations of tempo, dynamics, etc. Composers are able to do the sameby applying subtle variations. Extreme examples of small gradual changescan be found, for instance, in minimal music (e.g. Reich, Glass, Riley). As aresult, observed data consist of a dominating deterministic component plusmany other very subtle (and presumably also deterministic, i.e. intended)components. Thus, because of their subtle nature, many musically relevantfeatures are difficult to detect and can often be identified in a descriptiveway only - for instance by suitable graphical displays. A formal statistical“proof” that these features are indeed real, and not just accidental, is thenonly possible if more similar data are collected.To illustrate this, consider the tempo curves of three performances of

Robert Schumann’s (1810-1856) Traumerei by Vladimir Horowitz (1903-1989), displayed in Figure 2.2. It is obvious that the three curves are verysimilar even with respect to small details. However, since these details areof a local nature and we observed only three performances, it is not an easytask to show formally (by statistical hypothesis testing or confidence inter-vals) that, apart from an overall smooth trend, Horowitz’s tempo variationsare not random. An even more difficult task is to “explain” these features,i.e. to attach an explicit musical meaning to the local tempo changes.

©2004 CRC Press LLC

Page 30: Statistics in Musicology

q=100 (72)

Träumerei op. 15, No. 7Robert Schumann

Piano

p

5

9

ritard.

13

17

ritard.

a tempo

21

23

ritard.

p

Figure 2.1 Robert Schumann (1810-1856) – Traumerei op. 15, No. 7.

©2004 CRC Press LLC

Page 31: Statistics in Musicology

onset time

log

(te

mp

o)

0 10 20 30

-15

-10

-5

0

1947

1963

1965

Figure 2.2 Tempo curves of Schumann’s Traumerei performed by VladimirHorowitz.

2 . 2 S o m e d e s cri p t i ve s t at i s t i cs an d p l o t s f o r u n i vari at e d at a

2.2.1 Definitions

We give a brief summary of univariate descriptive statistics. For a com-prehensive discussion we refer the reader to standard text books such asTukey (1977), Mosteller and Tukey (1977), Hoaglin (1977), Tufte (1977),Velleman and Hoaglin (1981), Chambers et al. (1983), Cleveland (1985).Suppose that we observe univariate data x1, x2, ..., xn. To summarize

general characteristics of the data, various numerical summary statisticscan be calculated. Essential features are in particular center (location),variability, asymmetry, shape of distribution, and location of unusual values(outliers). The most frequently used statistics are listed in Table 2.1.We recall a few well known properties of these statistics:

• Sample mean: The sample mean can be understood as the “center ofgravity” of the data, whereas the median divides the sample in two halves

©2004 CRC Press LLC

Page 32: Statistics in Musicology

Table 2.1 Simple descriptive statistics

Name Definition Feature measured

Empirical distribution Fn(x) = n−1∑n

i=1 1{xi ≤ x} Proportion offunction obs. ≤ x

Minimum xmin = min{x1, ..., xn} Smallest value

Maximum xmin = max{x1, ..., xn} Largest value

Range xrange = xmax − xmin Total spread

Sample mean x = n−1∑ni=1 xi Center

Sample median M = inf{x : Fn(x) ≥ 12} Center

Sample α−quantile qα = inf{x : Fn(x) ≥ α} Border oflower 100α%

Lower and upper Q1 = q 14, Q2 = q 3

4Border of

quartile lower 25%,upper 75%

Sample variance s2 = (n − 1)−1∑ni=1(xi − x)2 Variability

Sample standard s = +√

s2 Variabilitydeviation

Interquartile range IQR = Q2 − Q1 Variability

Sample skewness m3 = n−1∑ni=1[(xi − x)/s]3 Asymmetry

Sample kurtosis m4 = n−1∑ni=1[(xi − x)/s]4 − 3 Flat/sharp peak

with an (approximately) equal number of observations. In contrast to themedian, the mean is sensitive to outliers, since observations that are farfrom the majority of the data have a strong influence on its value.

• Sample standard deviation: The sample standard deviation is a measureof variability. In contrast to the variance, s is directly comparable withthe data, since it is measured in the same unit. If observations are drawnindependently from the same normal probability distribution (or a dis-tribution that is similar to a normal distribution), then the following ruleof thumb applies: (a) approximately 68% of the data are in the intervalx± s; (b) approximately 95% of the data are in the interval x± 2s; (c)almost all data are in the interval x± 3s. For a sufficiently large samplesize, these conclusions can be carried over to the population from whichthe data were drawn.

©2004 CRC Press LLC

Page 33: Statistics in Musicology

• Interquartile range: The interquartile range also measures variability. Itsadvantage, compared to s, is that it is much less sensitive to outliers. Ifthe observations are drawn from the same normal probability distribu-tion, then IQR/1.35 (or more precisely IQR/[Φ−1(0.75) − Φ−1(0.25)]where Φ−1 is the quantile function of the standard normal distribution)estimates the same quantity as s, namely the population standard devi-ation.

• Quantiles: For α = in (i = 1, ..., n), qα coincides with at least one ob-

servation. For other values of α, qα can be defined as in Table 1.1 or,alternatively, by interpolating neighboring observed values as follows: letβ = i

n < α < γ = i+1n . Then the interpolated quantile qα is defined by

qα = qβ +α− β

1/n(qγ − qα) (2.1)

Note that a slightly different convention used by some statisticians is tocall inf{x : Fn(x) ≥ α} the (α − 0.5

n )-quantile (see e.g. Chambers et al.1983).

• Skewness: Skewness measures symmetry/asymmetry. For exactly sym-metric data, m3 = 0, for data with a long right tail m3 > 0, for datawith a long left tail m3 < 0.

• Kurtosis: The kurtosis is mainly meaningful for unimodal distributions,i.e. distributions with one peak. For a sample from a normal distribution,m4 ≈ 0. The reason is that then E[(X − µ)4] = 3σ4 where µ = E(X).For samples from unimodal distributions with a sharper or flatter peakthan the normal distribution, we then tend to have m4 > 0 and m4 < 0respectively.

Simple, but very useful graphical displays are:

• Histogram: 1. Divide an interval (a, b] that includes all observations intodisjoint intervals I1 = (a1, b1], ..., Ik = (ak, bk]. 2. Let n1, ..., nk be thenumber of observations in the intervals I1, ..., Ik respectively. 3. Aboveeach interval Ij , plot a rectangle of width wj = bj − aj and heighthj = nj/wj . Instead of the absolute frequencies, one can also use relativefrequencies nj/n where n = n1 + ...+nk. The essential point is that thearea is proportional to nj . If the data are drawn from a probabilitydistribution with density function f, then the histogram is an estimateof f.

• Kernel estimate of a density function: The histogram is a step function,and in that sense does not resemble most density functions. This can beimproved as follows. If the data are realizations of a continuous randomvariable X with distribution F (x) = P (X ≤ x) =

∫ x−∞ f(u)du, then a

smooth estimate of the probability density function f can be defined bya kernel estimate (Rosenblatt 1956, Parzen 1962, Silverman 1986) of the

©2004 CRC Press LLC

Page 34: Statistics in Musicology

form

f(x) =1nb

n∑i=1

K(xi − x

b) (2.2)

where K(u) = K(−u) ≥ 0 and∫∞−∞K(u)du = 1. Most kernels used in

practice also satisfy the condition K(u) = 0 for |u| > 1. The “band-width” b then specifies which data in the neighborhood of x are usedto estimate f(x). In situations where one has partial knowledge of theshape of f, one may incorporate this into the estimation procedure. Forinstance, Hjort and Glad (2002) combine parametric estimation basedon a preliminary density function f(x; θ) with kernel smoothing of the“remaining density” f/f(x; θ). They show that major efficiency gainscan be achieved if the preliminary model is close to the truth.

• Barchart: If data can assume only a few different values, or if data arequalitative (i.e. we only record which category an item belongs to), thenone can plot the possible values or names of categories on the x-axis andon the vertical axis the corresponding (relative) frequencies.

• Boxplot (simple version): 1. Calculate Q1,M,Q2 and IQR = Q2 −Q1.2. Draw parallel lines (in principle of arbitrary length) at the levelsQ1,M,Q2, A1 = Q1 − 3

2IQR,A2 = Q2 + 32IQR,B1 = Q1 − 3IQR and

B2 = Q1 + 3IQR. The points A1, A2 are called inner fence, and B1, B2

are called outer fence. 3. Identify the observation(s) between Q1 and A1

that is closest to A1 and draw a line connecting Q1 with this point. Dothe same for Q2 and A2. 4. Identify observation(s) between A1 and B1

and draw points (or other symbols) at those places. Do the same forA2 and B2. 5. Draw points (or other symbols) for observations beyondB1 and B2 respectively. The boxplot can be interpreted as follows: therelative positions of Q1,M,Q2 and the inner and outer fences indicatesymmetry or asymmetry. Moreover, the distance between Q1 and Q2 isthe IQR and thus measures variability. The inner and outer fences helpto identify outliers, i.e. values lying unusually far from most of the otherobservations.

• Q-q-plot for comparing two data sets x1, ..., xn and y1, ..., ym : 1. Definea certain number of points 0 < p1 < ... < pk ≤ 1 (the standard choice is:pi = i−0.5

N where N = min(n,m)). 2. Plot the pi-quantiles (i = 1, ..., N)of the y−observations versus those of the x−−observations. Alternativeplots for comparing distributions are discussed e.g. in Ghosh and Beran(2000) and Ghosh (1996, 1999).

©2004 CRC Press LLC

Page 35: Statistics in Musicology

2 . 3 Sp e ci fic appl i cati o ns i n mus i c – uni vari ate

2.3.1 Tempo curves

Figure 2.3 displays 28 tempo curves for performances of Schumann’s Trau-merei op. 15, No. 7, by 24 pianists. The names of the pianists and datesof the recordings (in brackets) are Martha Argerich (before 1983), ClaudioArrau (1974), Vladimir Ashkenazy (1987), Alfred Brendel (before 1980),Stanislav Bunin (1988), Sylvia Capova (before 1987), Alfred Cortot (1935,1947 and 1953), Clifford Curzon (about 1955), Fanny Davies (1929), JorgDemus (about 1960), Christoph Eschenbach (before 1966), Reine Gianoli(1974), Vladimir Horowitz (1947, before 1963 and 1965), Cyprien Katsaris(1980), Walter Klien (date unknown), Andre Krust (about 1960), AntoninKubalek (1988), Benno Moisewitsch (about 1950), Elly Ney (about 1935),Guiomar Novaes (before 1954), Cristina Ortiz (before 1988), Artur Schn-abel (1947), Howard Shelley (before 1990), Yakov Zak (about 1960).Tempo is more likely to be varied in a relative rather than absolute way.

For instance, a musician plays a certain passage twice as fast as the previ-ous one, but may care less about the exact absolute tempo. This suggestsconsideration of the logarithm of tempo. Moreover, the main interest lies incomparing the shapes of the curves. Therefore, the plotted curves consistof standardized logarithmic tempo (each curve has sample mean zero andvariance one).Schumann’s Traumerei is divided into four main parts, each consisting

of about eight bars, the first two and the last one being almost identi-cal (see Figure 2.1). Thus, the structure is: A,A

′, B, and A

′′. Already a

very simple exploratory analysis reveals interesting features. For each pi-anist, we calculate the following statistics for the four parts respectively:x,M, s,Q1, Q2,m3 and m4. Figures 2.4a through e show a distinct patternthat corresponds to the division into A,A

′, B, and A

′′. Tempo is much

lower in A′′and generally highest in B. Also, A

′seems to be played at a

slightly slower tempo than A – though this distinction is not quite so clear(Figures 2.4a,b). Tempo is varied most towards the end and considerablyless in the first half of the piece (Figures 2.4c). Skewness is generally nega-tive which is due to occasional extreme “ritardandi”. This is most extremein part B and, again, least pronounced in the first half of the piece (A,A

′′).

A mirror image of this pattern, with most extreme positive values in B,is observed for kurtosis. This indicates that in B (and also in A

′′), most

tempo values vary little around an average value, but occasionally extremetempo changes occur. Also, for A, there are two outliers with an extremlynegative skewness – these turn out to be Fanny Davies and Jorg Demus.Figures 2.4f through h show another interesting comparison of boxplots.In Figure 2.4f, the differences between the lower quartiles in A and A

′′

for performances before 1965 are compared with those from performancesrecorded in 1965 or later. The clear difference indicates that, at least for the

©2004 CRC Press LLC

Page 36: Statistics in Musicology

onset time

log

(te

mp

o)

0 10 20 30

-100

-80

-60

-40

-20

0 ARGERICH

ARRAU

ASKENAZE

BRENDEL

BUNIN

CAPOVA

CORTOT1

CORTOT2

CORTOT3

CURZON

DAVIES

DEMUS

ESCHENBACH

GIANOLI

HOROWITZ1

HOROWITZ2

HOROWITZ3

KATSARIS

KLIEN

KRUST

KUBALEK

MOISEIWITSCH

NEY

NOVAES

ORTIZ

SCHNABEL

SHELLEY

ZAK

Figure 2.3 Twenty-eight tempo curves of Schumann’s Traumerei performed by 24pianists. (For Cortot and Horowitz, three tempo curves were available.)

sample considered here, pianists of the “modern era” tend to make a muchstronger distinction between A and A

′′in terms of slow tempi. The only

exceptions (outliers in the left boxplot) are Moiseiwitsch and Horowitz’first performance and Ashkenazy (outlier in the right boxplot). The com-parsion of skewness and curtosis in Figures 2.4g and h also indicates that“modern” pianists seem to prefer occasional extreme ritardandi. The onlyexception in the “early 20th century group” is Artur Schnabel, with anextreme skewness of −2.47 and a kurtosis of 7.04.Direct comparisons of tempo distributions are shown in Figures 2.5a

©2004 CRC Press LLC

Page 37: Statistics in Musicology

Figure 2.4 Boxplots of descriptive statistics for the 28 tempo curves in Figure2.3.

through f. The following observations can be made: a) compared to Demus(quantiles on the horizontal axis), Ortiz has a few relatively extreme slowtempi (Figure 2.5a); b) similarily, but in a less extreme way, Cortot’s inter-pretation includes occasional extremely slow tempo values (Figure 2.5b); c)Ortiz and Argerich have practically the same (marginal) distribution (Fig-ure 2.5c); d) Figure 2.5d is similar to 2.5a and b, though less extreme; e) thetempo distribution of Cortot’s performance (Figure 2.5e) did not changemuch in 1947 compared to 1935; f) similarily, Horowitz’s tempo distribu-

©2004 CRC Press LLC

Page 38: Statistics in Musicology

tions in 1947 and 1963 are almost the same, except for slight changes forvery low tempi (Figure 2.5f).

Demus

Ort

iz

-2 -1 0 1

-4

-3

-2

-1

0

1

Figure 2.5a: q-q-plot Demus (1960) - Ortiz (1988)

Demus

Co

rto

t

-2 -1 0 1

-4

-3

-2

-1

0

1

2

Figure 2.5b: q-q-plot Demus (1960) - Cortot (1935)

OrtizA

rgerich

-4 -3 -2 -1 0 1

-4

-3

-2

-1

0

1

2

Figure 2.5c: q-q-plot Ortiz (1988) - Argerich (1983)

Demus

Kru

st

-2 -1 0 1

-4

-3

-2

-1

0

1

Figure 2.5d: q-q-plot Demus (1960) - Krust (1960)

Cortot 1935

Cort

ot 1947

-4 -3 -2 -1 0 1 2

-4

-2

02

Figure 2.5e: q-q-plot Cortot (1935) - Cortot (1947)

Horowitz 1947

Ho

row

itz 1

96

3

-4 -3 -2 -1 0 1

-4

-3

-2

-1

0

1

Figure 2.5f: q-q-plot Horowitz (1947) - Horowitz (1963)

Figure 2.5 q-q-plots of several tempo curves (from Figure 2.3).

2.3.2 Notes modulo 12

In most classical music, a central tone around which notes “fluctuate” canbe identified, and a small selected number of additional notes or chords(often triads) play a special role. For instance, from about 400 to 1500A.D., music was mostly written using so-called modes. The main notes

©2004 CRC Press LLC

Page 39: Statistics in Musicology

were the first one (finalis, the ”final note”) and the fifth note of the scale(dominant). The system of 12 major and 12 minor scales was developedlater, adding more flexibility with respect to modulation and scales. Themain “representatives” of a major/minor scale are three triads, obtainedby “adding” thirds, starting at the basic note corresponding to the first(tonic), fourth (subtonic) and fifth (tonic) note of the scale respectively.Other triads are also – but to a lesser degree – associated with the properties“tonic”, “subtonic” and/or “dominant”. In the 20th century, and partiallyalready in the late 19th century, other systems of scales as well as systemsthat do not rely on any specific scales were proposed (in particular 12-tonemusic).

1

1

1

1

1

1

1

1

1

1

1

1

2

2

2

2

2

2

2

2

2

2

2

2

3

3

3

3

3

3

3

3

3

3

3

3

4

4

4 4

4

4

4

4

4

4

4

4

5

5

5 5

5

5

5

5

5

5

5

5

6

6

6

6

6

6

6

6

6

6

6

6

7

7

7

7

7

7

7

7

7

7

7

7

8

8

8

8

8

8

8

8

8

8 8

8

9

9

9 9

9

9

9

9

9

9

9

9

0

0

0

0

0

0

0

0

0 0

0

0

a

a

a

a

a

a

a

a

a

a

a

a

b

b

b

b

b

b

b

b

b

b

b

b

c

c

c

c

c

c

c

c

c

c

c

c

d

d

d

d

d

d

d

d

d

d

d

d

e

e

e

e

e

e

e

e

e

e

e

e

f

f

f

f

f

f

f

f f

f

f

f

(Notes-Tonic) mod 12

0 1 2 3 4 5 6 7 8 9 10 11

0.0

0.0

50

.10

0.1

50

.20

Figure 2.6a: J.S.Bach - Fugue 1, frequencies of notes number i, i in [1+j,16+j] (j=0, ,64)

1

11

1 1

1

11

1

1

1

1

2

2

2

2 2

2

2

2

2

2

2

2

3

3

3

33

3

33

3

3 3

3

4

4

4

44

4

44

4

44

4

5

5

5

55

5

55 5

5

55

6

6

6

66

6

66 6

6

66

7

7

7

77

7

77 7

77

7

8

8

8

88

8

88 8

88

8

9

9

9

99

9

9

99

99 9

0

0

0

00

0

0

0

0

00 0

a

aa

a

a

a

a

a

a

a

aa

b

b b bb

b

b

b

b

b

bb

c

c c cc

c

c

c

c

c c

c

d

d d dd

d

d

d

d

d d

d

e

ee

ee

e

e

e

e

ee

e

f

ff

ff

f

f

f

f

ff

f

(Notes-Tonic) mod 12

0 1 2 3 4 5 6 7 8 9 10 11

0.0

0.1

0.2

0.3

0.4 Figure 2.6b: W.A.Mozart - KV 545,

frequencies of notes number i, i in [1+j,16+j] (j=0, ,64)

1

11

1

1

11

1

1

1 1

1

2

2

2

2

22

22

2

2

2

2

3

3

3

3

33

3

3

3

33

3

4

4

4

4

4 4

4

44

4 4

4

5

55

5

5

5

5

55

5

5

5

6

66

6

66

6

66

66

6

7

77

7

77

7

7

7

7 7

7

8

88

8

8

8

8

8

8

8 8

8

9

99

9

9

9

9

9

9

9

9

9

0

0 0 0

00

0

0

0

0 0

0

a

a a a

aa

a

a

a

aa

a

b

b b b

b b

b

b

b

b

b

b

c

cc

c

c c

c

c

c

c c

c

d

dd

d

dd

d

d

d

dd

d

e

ee

e

ee

ee

e

e

e

e

f

f

f

f

f f

ff

f

f

f

f

(Notes-Tonic) mod 12

0 1 2 3 4 5 6 7 8 9 10 11

0.0

0.1

0.2

0.3

0.4

Figure 2.6c: R.Schumann - op.15/2, frequencies of notes number i, i in [1+j,16+j] (j=0, ,64)

1

11

1

1

1

1

1

1

11

1

2

2 2

2

2

2

2

2 2

2

2

2

3

33

3

3

3

3

3 3

3

3

3

4

44

4

4

4

4

4 4

44

4

5

55

5

5

5

5

5 5

55

5

6

66

6

6

6

6

6

6

66

6

7

7

77

7

7

7

7

7

77

7

8

8

8

8

8

8

8

8

8

88

8

9

9

9

9

9

9

9

9

9

9

9 9

0

0

0

0

0

0

0

0

00

00

a

aa

a

a

a

a

a

a

aa

a

b

bb

b

b

b

b

b

b

bb

b

c

cc

c

c

c

c

c

c

cc

c

d

dd

d

d

d

d

d

d

dd

d

e

ee

e

e

e

e

e e

e

e

e

f

ff

f

f

f

f

ff

f

f

f

(Notes-Tonic) mod 12

0 1 2 3 4 5 6 7 8 9 10 11

0.0

0.1

00

.20

0.3

0 Figure 2.6d: R.Schumann - op.15/3, frequencies of notes number i, i in [1+j,16+j] (j=0, ,65)

Figure 2.6 Frequencies of notes 0,1,...,11 for moving windows of onset-length 16.

A very simple illustration of this development can be obtained by count-ing the frequencies of notes (pitches) in the following way: consider a scorein equal temperament. Ignoring transposition by octaves, we can representall notes x(t1), ..., x(tn) by the integers 0, 1, ..., 11. Here, t1 ≤ t2 ≤ ... ≤ tn

©2004 CRC Press LLC

Page 40: Statistics in Musicology

1

1 1

1

1 1

1 1 1

1

1

1

2

2 2

2

2 2

2 2 2

2

22

3

3 33

3

3

3 3 3

3

33

4

4 44

44

4 4 4

4

44

5

5 5

5

5

5

5 5 5

5

55

6

6 6

6

6

6

6 6 6

6

66

7

7 7

7

7

7

7 7 7

7

77

8

8 8

8

88

8 8 8

8

8 8

9

9 9

9

9

9

9 9 9

9

9 9

0

0 0

00

0

0 0 0

0

0 0

a

aa

a a

a

a a a

a

aa

b

b

b b b

b

b bb

b

bb

c

c

c

c

c

c

c cc

c

cc

d

d

d

dd

d

d dd

d

dd

e

e

e

e

e

e

e ee

e

ee

f

f

f

ff

f

f ff

f

f f

(Notes-Tonic) mod 12

0 1 2 3 4 5 6 7 8 9 10 11

0.0

0.1

0.2

0.3

Figure 2.7a: A.Scriabin - op.51/2, frequencies of notes number i, i in [1+j,16+j] (j=0, ,64)

1

1 1

1 1

1

1

11

11 1

2

22

2 2

2

2

2

22

2 2

3

3 3

3 3

3

3

3

33

3 3

4

4 4

44

4

4

44 4 4

4

5

55

55

5

5

5

5

55

5

6

6

6

66

6

6

6

6

66

6

7

7 77

7

7 7

7

7

77

7

8

88

88

8

8

8

8

88

8

9

9

9 99

9

9

9

9

99

9

0

0

00

0

0

0

0

0

00

0

a

a

aa

a

a

a

a

a

a aa

b

b

b bb

b

bb

b

bb b

c

cc c

c

c

cc

c

c

cc

d

d dd

d

d

dd

d

dd d

e

e

e

ee

e ee

e

e ee

f

f

f

f ff

ff

f

f f f

(Notes-Tonic) mod 12

0 1 2 3 4 5 6 7 8 9 10 11

0.0

0

.1

0.2

0

.3

Figure 2.7b: A.Scriabin - op.51/4, frequencies of notes number i, i in [1+j,16+j] (j=0, ,64)

1

1

1

1

1

1

1 1

1

1 1

1

2

2

2

2

2

2

2

2 2

2

2

23

3 3 3

3

3 3

3 3

3

3

3

4

4

4

4

4 4

4

4

4

4

4 4

5

5

5

5 5

5 5

5

5 5

5

5

6 6

6 6

6 6

6

6

6

6 6

6

7

7

7

7

7

7

7 7

7

7

7

7

8

8

8 8

8 8

8 8

8

8 8

8

9

9 9 9

9

9

9 9 9

9

9

9

0

0 0 0 0

0

0 0

0

0

0 0

a

a a

a

a

a

a

a

a a

a a

b

b

b

b b

b b

b

b

b

b

b

c

c c

c

c

c

c

c

c c c

c

d

d d d d

d d

d d

d

d d

e

e e

e

e

e

e

e

e

e

e

e

f

f

f f

f

f

f f

f

f f

f

(Notes-Tonic) mod 12

0 1 2 3 4 5 6 7 8 9 10 11

0.0

4

0.0

8

0.1

2

Figure 2.7c: F.Martin - Prelude 6, frequencies of notes number i, i in [1+j,16+j] (j=0, ,64)

1

1 1

1

1 1

1

1 1

1

1 12

2

2

2

2

2

2

2 2

2

2 2

3

3 3

3

3

3

3

3 3

3 3

3

4

4 4

4

4 4

4

4 4

4

4 4

5

5

5

5

5

5

5

5 5

5

5 5

6 6

6

6

6

6

6

6

6

6

6

67

7

7

7

7

7

7

7

7

7

7

7

8

8 8

8

8

8

8

8

8

8

8

89

9

9

9

9

9

9

9 9

9

9 90

0

0

0

0

0

0

0

0

0

0 0a

a

a

a a a

a

a

a

a

a

a

b

b

b

b

b

b

b

b

b

b

b b

c

c

c

c

c

c

c

c

c

c

c

cd

d d

d

d

d

d

d

d

d

d d

e

e

e

e

e

e e

e

e e

e

ef

f

f

f

f

f f

f

f f

f

f

(Notes-Tonic) mod 12

0 1 2 3 4 5 6 7 8 9 10 11

0.0

0.1

0

0.2

0Figure 2.7d: F.Martin - Prelude 7,

frequencies of notes number i, i in [1+j,16+j] (j=0, ,64)

Figure 2.7 Frequencies of notes 0,1,...,11 for moving windows of onset-length 16.

denote the score-onset times of the notes. To make different compositionscomparable, the notes are centered by subtracting the central note whichis defined to be the most frequent note. Given a prespecified integer k (inour case k = 16), we calculate the relative frequencies

pj(x) = (2k + 1)−1

j+2k∑i=j

1{x(ti) = x}

where 1{x(ti) = x} = 1, if x(ti) = x and zero otherwise and j = 1, 2, ..., n−2k − 1. This means that we calculate the distribution of notes for a mov-ing window of 2k + 1 notes. Figures 2.6a through d and 2.7a through ddisplay the distributions pj(x) (j = 4, 8, ..., 64) for the following composi-tions: Fugue 1 from “Das Wohltemperierte Klavier I” by J.S. Bach (1685-1750), Sonata KV 545 (first movement) by W.A. Mozart (1756-1791; Figure2.8), Kinderszenen No. 2 and 3 by R. Schumann (1810-1856; Figure 2.9),Preludes op. 51, No. 2 and 4 by A. Scriabin (1872-1915) and Preludes No.

©2004 CRC Press LLC

Page 41: Statistics in Musicology

Figure 2.8 Johannes Chrysostomus Wolfgangus Theophilus Mozart (1756-1791)in the house of Salomon Gessner in Zurich. (Courtesy of ZentralbibliothekZurich.)

©2004 CRC Press LLC

Page 42: Statistics in Musicology

Figure 2.9 R. Schumann (1810-1856) – lithography by H. Bodmer. (Courtesy ofZentralbibliothek Zurich.)

©2004 CRC Press LLC

Page 43: Statistics in Musicology

6 and 7 by F. Martin (1890-1971). For each j = 4, 8, ..., 64, the frequen-cies pj(0), ..., pj(11) are joined by lines respectively. The obvious commonfeature for Bach, Mozart and Schumann is a distinct preference (local max-imum) for the notes 5 and 7 (apart from 0). Note that if 0 is the root ofthe tonic triad, then 5 corresponds to the root of the subdominant triad.Similarily, 7 is root of the dominant triad. Also relatively frequent are thenotes 3 =minor third (second note of tonic triad in minor) and 10 =minorseventh, which is the fourth note of the dominant seventh chord to thesubtonic. Also note that, for Schumann, the local maxima are somewhatless pronounced. A different pattern can be observed for Scriabin and evenmore for Martin. In Scriabin’s Prelude op. 51/2, the perfect fifth almostnever occurs, but instead the major sixth is very frequent. In Scriabin’sPrelude op. 51/4, the tonal system is dissolved even further, as the clearlydominating note is 6 which builds together with 0 the augmented fourth(or diminished fifth) – an interval that is considered highly dissonant intonal music. Nevertheless, even in Scriabin’s compositions, the distributionof notes does not change very rapidly, since the sixteen overlayed curves arealmost identical. This may indicate that the notion of scales or a slow har-monic development still play a role. In contrast, in Frank Martin’s PreludeNo. 6, the distribution changes very quickly. This is hardly surprising, sinceMartin’s style incorporates, among other influences, dodecaphonism (12-tone music) – a compositional technique that does not impose traditionalrestrictions on the harmonic structure.

2.4 Some descriptive statistics and plots for bivariate data

2.4.1 Definitions

We give a short overview of important descriptive concepts for bivariatedata. For a comprehensive treatment we refer the reader to standard textbooks given above (also see e.g. Plackett 1960, Ryan 1996, Srivastava andSen 1997, Draper and Smith 1998, and Rao 1973 for basic theoretical re-sults).

Correlation

If each observation consists of a pair of measurements (xi, yi), then the mainobjective is to investigate the relationship between x and y. Consider, forexample, the case where both variables are quantitative. The data can thenbe displayed in a scatter plot (y versus x). Useful statistics are Pearson’ssample correlation

r =1n

n∑i=1

(xi − x

sx)(yi − y

sy) =

∑ni=1(xi − x)(yi − y)√∑n

i=1(xi − x)2∑ni=1(yi − y)2

(2.3)

©2004 CRC Press LLC

Page 44: Statistics in Musicology

where s2x = n−1∑ni=1(xi− x)2 and s2y = n−1

∑ni=1(yi− y)2 and Spearman’s

rank correlation

rSp =1n

n∑i=1

(ui − u

su)(vi − v

sv) =

∑ni=1(ui − u)(vi − v)√∑n

i=1(ui − u)2∑n

i=1(vi − v)2(2.4)

where ui denotes the rank of xi among the x−values and vi is the rankof yi among the y−values. In (2.3) and (2.4) it is assumed that sx, sy,su and sv are not zero. Recall that these definitions imply the followingproperties: a) −1 ≤ r, rSp ≤ 1; b) r = 1, if and only if yi = βo + β1xiand β1 > 0 (exact linear relationship with positive slope); c) r = −1, ifand only if yi = βo + β1xi and β1 < 0 (exact linear relationship withnegative slope); d) rSp = 1, if and only if xi > xj implies yi > yj (strictlymonotonically increasing relationship); e) r = −1, if and only if xi >xj implies yi < yj (strictly monotonically decreasing relationship); f) rmeasures the strength (and sign) of the linear relationship; g) rSp measuresthe strength (and sign) of monotonicity; h) if the data are realizations of abivariate random variable (X,Y ), then r is an estimate of the populationcorrelation ρ = cov(X,Y )/

√var(X)var(Y ) where cov(X,Y ) = E[XY ] −

E[X ]E[Y ], var(X) = cov(X,X) and var(Y ) = cov(Y, Y ). When usingthese measures of dependence one should bear in mind that each of themmeasures a specific type of dependence only, namely linear and monotonicdependence respectively. Thus, a Pearson or Spearman correlation nearor equal to zero does not necessarily mean independence. Note also thatcorrelation can be interpreted in a geometric way as follows: defining then−dimensional vectors x = (x1, ..., xn)t and y = (y1, ..., yn)t, r is equal tothe standardized scalar product between x and y, and is therefore equal tothe cosine of the angle between these two vectors.A special type of correlation is interesting for time series. Time series are

data that are taken in a specific ordered (usually temporal) sequence. IfY1, Y2, ..., Yn are random variables observed at time points i = 1, ..., n, thenone would like to know whether there is any linear dependence betweenobservations Yi and Yi−k, i.e. between observations that are k time unitsapart. If this dependence is the same for all time points i, and the expectedvalue of Yi is constant, then the corresponding population correlation canbe written as function of k only (see Chapter 4),

cov(Yi, Yi+k)√var(Yi)var(Yi+k)

= ρ(k) (2.5)

and a simple estimate of ρ(k) is the sample autocorrelation (acf)

ρ(k) =1n

n−k∑i=1

(yi − y

s)(yi+k − y

s) (2.6)

where s2 = n−1∑(yi − y)(yi+k − y). Note that here summation stops at

©2004 CRC Press LLC

Page 45: Statistics in Musicology

n− k, because no data are available beyond (n− k)+ k = n. For large lags(large compared to n), ρ(k) is not a very precise estimate, since there areonly very few pairs that are k time units apart.The definition of ρ(k) and ρ(k) can be extended to multivariate time

series, taking into account that dependence between different componentsof the series may be delayed. For instance, for a bivariate time series (Xi, Yi)(i = 1, 2, ...), one considers lag-k sample cross-correlations

ρXY (k) =1n

n−k∑i=1

(xi − x

sX)(yi+k − y

sY) (2.7)

as estimates of the population cross-correlations

ρXY (k) =cov(Xi, Yi+k)√var(Xi)var(Yi+k)

(2.8)

where s2X = n−1∑(xi− x)(xi+k − x) and s2Y = n−1

∑(yi− y)(yi+k − y). If

|ρXY (k)| is high, then there is a strong linear dependence between Xi andYi+k.

Regression

In addition to measuring the strength of dependence between two variables,one is often interested in finding an explicit functional relationship. Forinstance, it may be possible to express the response variable y in terms of anexplanatory variable x by y = g(x, ε) where ε is a variable representing thepart of y that is unexplained. More specifically, we may have, for example,an additive relationship y = g(x) + ε or a multiplicative equation y =g(x)eε. The simplest relationship is given by the simple linear regressionequation

y = βo + β1x+ ε (2.9)

where ε is assumed to be a random variable with E(ε) = 0 (and usuallyfinite variance σ2 = var(ε) < ∞). Thus, the data are yi = βo+β1xi+εi (i =1, ..., n) where the ε′is are generated by the same zero mean distribution.Often the εi’s are also assumed to uncorrelated or even independent – this ishowever not a necessary assumption. An obvious estimate of the unknownparameters βo and β1 is obtained by minimizing the total sum of squarederrors

SSE = SSE(bo, b1) =∑

(yi − bo − b1xi)2 =∑

r2i (bo, b1) (2.10)

with respect to bo, b1. The solution is found by setting the partial derivativeswith respect to bo and b1 equal to zero. A more elegant way to find thesolution is obtained by interpreting the problem geometrically: defining then-dimensional vectors 1 = (1, ..., 1)t, b = (bo, b1)t and the n× 2 matrix Xwith columns 1 and x, we have SSE = ||y − bo1 − b1x||2 = ||y − Xb||2

©2004 CRC Press LLC

Page 46: Statistics in Musicology

where ||.|| denotes the squared euclidian norm, or length of the vector. Itis then clear that SSE is minimized by the orthogonal projection of y onthe plane spanned by 1 and x. The estimate of β = (βo, β1)t is therefore

β = (βo, β1)t = (XtX)−1Xty (2.11)

and the projection – which is the vector of estimated values yi – is givenby

y = (y1, ..., yn)t = X(XtX)−1Xty (2.12)Defining the measure of the total variability of y, SST = ||y−y1||2 (totalsum of squares), and the quantities SSR = ||y−y1||2 (regression sum ofsquares=variability due to the fact that the fitted line is not horizontal)and SSE = ||y − y||2 (error sum of squares, variability unexplained byregression line), we have by Pythagoras

SST = SSR+ SSE (2.13)

The proportion of variability “explained” by the regression line y = βo+β1xis therefore

R2 =∑ni=1(yi − yi)2∑ni=1(yi − y)2

=||y − y1||2||y − y1||2 =

SSR

SST= 1− SSE

SST. (2.14)

By definition, 0 ≤ R2 ≤ 1, and R2 = 1 if and only if yi = yi (i.e. all pointsare on the regression line). Moreover, for simple regression we also haveR2 = r2. The advantage of defining R2 as above (instead of via r2) is thatthe definition remains valid for the multiple regression model (see below),i.e. when several explanatory variables are available. Finally, note that anestimate of σ2 is obtained by σ2 = (n− 2)−1

∑r2i (βo, β1).

In analogy to the sample mean and the sample variance, the least squaresestimates of the regression parameters are sensitive to the presence of out-liers. Outliers in regression can occur in the y-variable as well as in thex-variable. The latter are also called influential points. Outliers may oftenbe correct and in fact very interesting observations (e.g. telling us that theassumed model may not be correct). However, since least squares estimatesare highly influenced by outliers, it is often difficult to notice that theremay be a problem, since the fitted curve tends to lie close to the outliers.Alternative, robust estimates can be helpful in such situations (see Huber1981, Hampel et al. 1986). For instance, instead of minimizing the residualsum of squares we may minimize

∑ρ(ri) where ρ is a bounded function.

If ρ is differentiable, then the solution can usually also be found by solvingthe equations

n∑i=1

ρ′(r

σ)∂

∂bjr(b) = 0 (j = 0, ..., p) (2.15)

where σ2 is a robust estimate of σ2 obtained from an additional equationand p is the number of explanatory variables. This leads to estimates that

©2004 CRC Press LLC

Page 47: Statistics in Musicology

are (up to a certain degree) robust with respect to outliers in y, not howeverwith respect to influential points (outliers in x). To control the effect ofinfluential points one can, for instance, solve a set of equations

n∑i=1

ψj(r

σ, xi) = 0 (j = 0, ..., p) (2.16)

where ψ is such that it downweighs outliers in x as well. For a compre-hensive theory of robustness see e.g. Huber (1981), Hampel et al. (1986).For more recent, efficient and highly robust methods see Yohai (1987),Rousseeuw and Yohai (1984), Gervini and Yohai (2002), and referencestherein.The results for simple linear regression can be extended easily to the case

where more than one explanatory variable is available. The multiple linearregression model with p explanatory variables is defined by y = βo+β1x1+...+βpxp+ε. For data we write yi = βo+β1xi1+...+βpxip+εi (i = 1, ..., n).Note that the word “linear” refers to linearity in the parameters βo, ..., βp.The function itself can be nonlinear. For instance, we may have polynomialregression with y = βo+β1x+ ...+βpxp+ε. The same geometric argumentsas above apply so that (2.11) and (2.12) hold with β = (βo, ..., βp)t, andthe n × (p + 1)−matrix X = (x(1), ..., x(p+1)) with columns x(1) = 1 andx(j+1) = xj = (x1j , ..., xnj)t (j = 1, ..., p).

Regression smoothing

A more general, but more difficult, approach to modeling a functional re-lationship is to impose less restrictive assumptions on the function g. Forinstance, we may assume

y = g(x) + ε (2.17)

with g being a twice continuously differentiable function. Under suitableadditional conditions on x and ε it is then possible to estimate g fromobserved data by nonparametric smoothing. As a special example considerobservations yi taken at time points i = 1, 2, ..., n. A standard model is

yi = g(ti) + εi (2.18)

where ti = i/n, εi are independent identically distributed (iid) randomvariables with E(εi) = 0 and σ2 = var(εi) < 0. The reason for usingstandardized time ti ∈ [0, 1] is that this way g is observed on an increasinglyfine grid. This makes it possible to ultimately estimate g(t) for all valuesof t by using neighboring values ti, provided that g is not too “wild”. Asimple estimate of g can be obtained, for instance, by a weighted average(kernel smoothing)

g(t) =n∑i=1

wiyi (2.19)

©2004 CRC Press LLC

Page 48: Statistics in Musicology

with suitable weights wi ≥ 0,∑wi = 1. For example, one may use the

Nadaraya-Watson weights

wi = wi(t; b, n) =K( t−tib )∑nj=1 K( t−tjb )

(2.20)

with b > 0, and a kernel function K ≥ 0 such that K(u) = K(−u), K(u) =0 (|u| > 1) and

∫ 1

−1K(u)du = 1. The role of b is to restrict observationsthat influence the estimate to a small window of neighboring time points.For instance, the rectangular kernel K(u) = 1

21{|u| ≤ 1} yields the samplemean of observations yi in the “window” n(t− b) ≤ i ≤ n(t+ b). An evenmore elegant formula can be obtained by approximating the Riemann sum1nb

∑nj=1K( t−tjb ) by the integral

∫ 1

−1K(u)du = 1:

g(t) =n∑i=1

wiyi =1nb

n∑i=1

K(t− tib

)yi (2.21)

In this case, the sum of the weights is not exactly equal to one, but asymp-totically (as n→ ∞ and b → 0 such that nb3 → ∞) this error is negligible.It can be shown that, under fairly general conditions on g and ε, g con-verges to g, in a certain sense that depends on the specific assumptions (seee.g. Gasser and Muller 1979, Gasser and Muller 1984, Hardle 1991, Beranand Feng 2002, Wand and Jones 1995, and references therein).An alternative to kernel smoothing is local polynomial fitting (Fan and

Gijbels 1995, 1996; also see Feng 1999). The idea is to fit a polynomiallocally, i.e. to data in a small neighborhood of the point of interest. Thiscan be formulated as a weighted least squares problem as follows:

g(t) = βo (2.22)

where β = (βo, β1, ..., βp)t solves a local least squares problem defined by

β =argmina

∑K(

ti − t

b)r2i (a). (2.23)

Here ri = yi− [ao+ a1(ti− t)+ ...+ ap(ti− t)p], K is a kernel as above andb > 0 is the bandwidth defining the window of neighboring observations.It can be shown that asymptotically, a local polynomial smoother can bewritten as kernel estimator (Ruppert and Wand 1994). A difference onlyoccurs at the borders (t close to 0 or 1) where, in contrast to the localpolynomial estimate, the kernel smoother has to be modified. The reasonis that observations are no longer symmetrically spaced in the windowt± b). A major advantage of local polynomials is that they automaticallyprovide estimates of derivatives, namely g

′(t) = β1, g

′′(t) = 2β2 etc. Kernel

smoothing can also be used for estimation of derivatives; however different(and rather complicated) kernels have to be used for each derivative (Gasserand Muller 1984, Gasser et al. 1985). A third alternative, so-called wavelet

©2004 CRC Press LLC

Page 49: Statistics in Musicology

thresholding, will not be discussed here (see e.g. Daubechies 1992, Donohoand Johnston 1995, 1998, Donoho et al. 1995, 1996, Vidakovic 1999, andPercival and Walden 2000 and references therein). A related method basedof wavelets is discussed in Chapter 5.

Smoothing of two-dimensional distributions, sharpening

Estimating a relationship between x and y (where x and y are realizationsof random variables X and Y respectively) amounts to estimating the jointtwo-dimensional distribution function F (x, y) = P (X ≤ x, Y ≤ y). Forcontinuous variables with F (x, y) =

∫u≤x

∫v≤y f(u, v) dudv, the density

function f can be estimated, for instance, by a two-dimensional histogram.For visual and theoretical reasons, a better estimate is obtained by kernelestimation (see e.g. Silverman 1986) defined by

f(x, y) =1

nb1b2

∑i=1

K(xi − x, yi − y; b1, b2) (2.24)

where the kernel K is such that K(u, v) = K(−u, v) = K(u,−v) ≥ 0, and∫ ∫K(u, v)dudv = 1. Usually, b1 = b2 = b and K(u, v) has compact sup-

port. Examples of kernels areK(u, v) = 141{|u| ≤ 1}1{|v| ≤ 1} (rectangular

kernel with rectangular support), K(u, v) = π−11{u2 + v2 ≤ 1} (rectangu-lar kernel with circular support),K(u, v) = 2π−1[1−u2−v2] (Epanechnikovkernel with circular support) or K(u, v) = (2π)−1 exp[− 1

2 (u2 + v2)] (nor-

mal density kernel with infinite support). In analogy to one-dimensionaldensity estimation, it can be shown that under mild regularity conditions,f(x, y) is a consistent estimate of f(x, y), provided that b1, b2 → 0, andnb1, nb2 → ∞.Graphical representations of two-dimensional distribution functions are

• 3-dimensional perspective plot: z = f(x, y) (or f(x, y)) is plotted againstx and y;

• contour plot: like in a geographic map, curves corresponding to equallevels of f are drawn in the x-y-plane;

• image plot: coloring of the x-y-plane with the color at point (x, y) cor-responding to the value of f.

A simple way of enhancing the visual understanding of scatterplots is so-called sharpening (Tukey and Tukey 1981; also see Chambers et al. 1983):for given numbers a and b, only points with a ≤ f(x, y) ≤ b are drawn inthe scatterplot. Alternatively, one may plot all points and highlight pointswith a ≤ f(x, y) ≤ b.

Interpolation

Often a process may be generated in continuous time, but is observed atdiscrete time points. One may then wish to guess the values of the points

©2004 CRC Press LLC

Page 50: Statistics in Musicology

in between. Kernel and local polynomial smoothing provide this possibility,since g(t) can be calculated for any t ∈ (0, 1). Alternatively, if the obser-vations are assumed to be completely without “error”, i.e. yi = g(ti), thendeterministic interpolation can be used. The most popular method is splineinterpolation. For instance, cubic splines connect neighboring observed val-ues yi−1, yi by cubic polynomials such that the first and second derivativesat the endpoints ti−1, ti are equal. For observations y1, ..., yn at equidistanttime points ti with ti− ti−1 = tj − tj−1 = ∆t (i, j = 1, ..., n), we have n−1polynomials

pi(t) = ai + bi(t− ti) + ci(t− ti)2 + di(t− ti)3 (i = 1, ..., n− 1) (2.25)

To achieve smoothness at the points ti where two polynomials pi−1, pi meet,one imposes the condition that the polynomials and their first two deriva-tives are equal at ti. This together with the conditions pi(ti) = yi leads toa system of 3(n− 2) + n = 4(n− 1)− 2 equations for 4(n− 1) parametersai, bi, ci, di (i = 1, ..., n − 1). To specify a unique solution one thereforeneeds two additional conditions at the border. A typical assumption isp′′(t1) = p′′(tn) = 0 which defines so-called natural splines. Cubic splineshave a physical meaning, since these are the curves that form when a thinrod is forced to pass through n knots (in our case the knots are t1, ..., tn),corresponding to minimum strain energy. The term “spline” refers to thethin flexible rods that were used in the past by draftsmen to draw smoothcurves in ship design. In spite of their “natural” meaning, interpolationsplines (and similarily other methods of interpolation) can be problem-atic since the interpolated values may be highly dependent on the specificmethod of interpolation and are therefore purely hypothetical unless theaim is indeed to build a ship.Splines can also be used for smoothing purposes by removing the restric-

tion that the curve has to go through all observed points. More specifically,one looks for a function g(t) such that

V (λ) =n∑i=1

(yi − g(ti))2 + λ

∫ ∞

−∞[g

′′(t)]2dt (2.26)

is minimized. The parameter λ > 0 controls the smoothness of the resultingcurve. For small values of λ, the fitted curve will be rather rough but closeto the data; for large values more smoothness is achieved but the curveis, in general, not as close to the data. The question of which λ to choosereflects a standard dilemma in statistical smoothing: one needs to balancethe aim of achieving a small bias (λ small) against the aim of a smallvariance (λ large). For a given value of λ, the solution to the minimizationproblem above turns out to be a natural cubic spline (see Reinsch 1967;also see Wahba 1990 and references therein). The solution can also bewritten as a kernel smoother with a kernel function K(u) proportional

©2004 CRC Press LLC

Page 51: Statistics in Musicology

to exp(−|u|/√2) sin(π/4 + |u|/√2) and a bandwidth b proportional to λ 14

(Silverman 1986). If ti = i/n, then the bandwidth is exactly equal to λ 14 .

Statistical inference

In this section, correlation, linear regression, nonparametric smoothing,and interpolation were introduced in an informal way, without exact dis-cussion of probabilistic assumptions and statistical inference. All thesetechniques can be used in an informal way to explore possible structureswithout specific model assumptions. Sometimes, however, one wishes toobtain more solid conclusions by statistical tests and confidence intervals.There is an enormous literature on statistical inference in regression, in-cluding nonparametric approaches. For selected results see the referencesgiven above. For nonparametric methods also see Wand and Jones (1995),Simonoff (1996), Bowman and Azzalini (1997), Eubank (1999) and refer-ences therein.

2 . 5 Sp e ci fic appl i cati o ns i n mus i c – bi vari ate

2.5.1 Empirical tempo-acceleration

Consider the tempo curves in Figure 2.3. An approximate measure oftempo-acceleration may be defined by

a(ti) =∆2y(t)∆2t

=[y(ti)− y(ti−1)]− [y(ti−1)− y(ti−2)]

[ti − ti−1]− [ti−1 − ti−2](2.27)

where y(t) is the tempo (or log-tempo) at time t. Figures 2.10a through fshow a(t) for the three performances by Cortot and Horowitz. From thepictures it is not quite easy to see in how far there are similarilies or dif-ferences. Consider now the pairs (aj(ti), al(ti)) where aj , al are accelera-tion measurements of performance j and l respectively. We calculate thesample correlations for each pair (j, l) ∈ {1, ..., 28} × {1, ..., 28}, (j = l).Figure 2.11a shows the correlations between Cortot 1 (1947) and the otherperformances. As expected, Cortot correlates best with Cortot: the corre-lation between Cortot 1 and Cortot’s other two performances (1947, 1953)is clearly highest. The analogous observation can be made for Horowitz1 (1947) (Figure 2.11b). Also interesting is to compare how much overallresemblance there is between a selected performance and the other per-formances. For each of the 28 performances, the average and the maximalcorrelation with other performances were calculated. Figures 2.11c and dindicate that, in terms of accelaration, Cortot’s style appears to be quiteunique among the pianists considered here. The overall (average and max-imal) similarily between each of his three acceleration curves and the otherperformances is much smaller than for any other pianist.

©2004 CRC Press LLC

Page 52: Statistics in Musicology

onset time t

a(t

)

0 5 10 15 20 25 30

-10

-5

0

5

1

0

a) Acceleration - Cortot (1935)

onset time t

a(t

)

0 5 10 15 20 25 30

-10

-5

0

5

10 b) Acceleration - Cortot (1947)

onset time t

a(t

)

0 5 10 15 20 25 30

-15

-1

0

-5

0

5

10

c) Acceleration - Cortot (1953)

onset time t

a(t

)

0 5 10 15 20 25 30

-10

-5

0

5

1

0 d) Acceleration - Horowitz (1947)

onset time t

a(t

)

0 5 10 15 20 25 30

-10

-5

0

5

1

0 e) Acceleration - Horowitz (1963)

onset time t

a(t

)

0 5 10 15 20 25 30

-15

-10

-5

0

5

10

15

f) Acceleration - Horowitz (1965)

Figure 2.10 Acceleration of tempo curves for Cortot and Horowitz.

2.5.2 Interpolated and smoothed tempo curves – velocity and acceleration

Conceptually it is plausible to assume that musicians control tempo in con-tinuous time. The measure of acceleration given above is therefore a rathercrude estimate of the actual acceleration curve. Interpolation splines pro-vide a simple possibility to “guess” the tempo and its derivatives betweenthe observed time points. One should bear in mind, however, that interpo-lation is always based on specific assumptions. For instance, cubic splinesassume that the curve between two consecutive time points where obser-vations are available is, or can be well approximated by, a third degreepolynomial. This assumption can hardly be checked experimentally andcan lead to undesirable effects. Figure 2.12 shows the observed and inter-polated tempo for Martha Argerich. While most of the interpolated valuesseem plausible, there are a few rather doubtful interpolations (marked witharrows) where the cubic polynomial by far exceeds each of the two observedvalues at the neighboring knots.

©2004 CRC Press LLC

Page 53: Statistics in Musicology

Performance

Correla

tion

0 5 10 15 20 25

0.4

0.8

1.2 a) Acceleration - Correlations of

Cortot (1935) with other performances

AR

GE

RIC

H

AR

RA

U

AS

KE

NA

ZE

BR

EN

DE

L

BU

NIN CA

PO

VA

CO

RT

OT

2

CO

RT

OT

3

CU

RZ

ON

DA

VIE

S

DE

MU

S

ES

CH

EN

BA

CH

GIA

NO

LI

HO

RO

WIT

Z1

HO

RO

WIT

Z2

HO

RO

WIT

Z3

KA

TS

AR

IS

KL

IEN

KR

US

T

KU

BA

LE

K

MO

ISE

IWIT

SC

H

NE

Y

NO

VA

ES

OR

TIZ

SC

HN

AB

EL

SH

EL

LE

Y

ZA

K

Performance

Co

rre

latio

n

0 5 10 15 20 25

0.2

0.6

1.0

1.4

AR

GE

RIC

H

AR

RA

U

AS

KE

NA

ZE

BR

EN

DE

L

BU

NIN

CA

PO

VA

CO

RT

OT

1

CO

RT

OT

2

CO

RT

OT

3

CU

RZ

ON

DA

VIE

S

DE

MU

S

ES

CH

EN

BA

CH

GIA

NO

LI

HO

RO

WIT

Z2

HO

RO

WIT

Z3

KA

TS

AR

IS

KL

IEN

KR

US

T

KU

BA

LE

K

MO

ISE

IWIT

SC

H

NE

Y

NO

VA

ES

OR

TIZ

SC

HN

AB

EL

SH

EL

LE

Y

ZA

K

b) Acceleration- Correlations of Horowitz (1947) with other performances

Performance

me

an

co

rre

latio

n

0 5 10 15 20 25

0.4

0.5

0.6

0.7

0.8

AR

GE

RIC

H

AR

RA

U

AS

KE

NA

ZE

BR

EN

DE

L

BU

NIN

CA

PO

VA

CO

RT

OT

1

CO

RT

OT

2

CO

RT

OT

3

CU

RZ

ON

DA

VIE

S

DE

MU

S

ES

CH

EN

BA

CH

GIA

NO

LI

HO

RO

WIT

Z1

HO

RO

WIT

Z2

HO

RO

WIT

Z3

KA

TS

AR

IS

KLIE

N

KR

US

T

KU

BA

LE

K

MO

ISE

IWIT

SC

H

NE

Y

NO

VA

ES

OR

TIZ

SC

HN

AB

EL

SH

ELLE

Y

ZA

K

c) Mean correlations with other pianists

Performance

me

an

co

rre

latio

n

0 5 10 15 20 25

0.6

0.7

0.8

0.9

1.0

AR

GE

RIC

H AR

RA

U

AS

KE

NA

ZE

BR

EN

DE

L

BU

NIN

CA

PO

VA

CO

RT

OT

1

CO

RT

OT

2

CO

RT

OT

3

CU

RZ

ON

DA

VIE

S

DE

MU

S

ES

CH

EN

BA

CH

GIA

NO

LI

HO

RO

WIT

Z1

HO

RO

WIT

Z2

HO

RO

WIT

Z3

KA

TS

AR

IS

KLIE

N

KR

US

T

KU

BA

LE

K

MO

ISE

IWIT

SC

H

NE

Y

NO

VA

ES

OR

TIZ

SC

HN

AB

EL

SH

ELLE

Y

ZA

K

d) Maximal correlations with other pianists

Figure 2.11 Tempo acceleration – correlation with other performances.

©2004 CRC Press LLC

Page 54: Statistics in Musicology

Figure 2.12 Martha Argerich – interpolation of tempo curve by cubic splines.

2.5.3 Tempo – hierarchical decomposition by smoothing

The tempo curve may be thought of as an aggregation of mostly smoothtempo curves at different onset-time-scales. This corresponds to the generalstructure of music as a mixture of global and local structures at variousscales. It is therefore interesting to look at smoothed tempo curves, andtheir derivatives, at different scales. Reasonable smoothing bandwidths maybe guessed from the general structure of the composition such as timesignature(s), rhythmic, metric, melodic, and harmonic structure, and so on.For tempo curves of Schumann’s Traumerei (Figure 2.3), even multiples of1/8th are plausible. Figures 2.13 through 2.16 show the following kernel-smoothed tempo curves with b1 = 8, b2 = 1, and b3 = 1/8 respectively:

g1(t) = (nb1)−1∑

K(t− tib1

)yi (2.28)

g2(t) = (nb2)−1∑

K(t− tib2

)[yi − g1(t)] (2.29)

g3(t) = (nb3)−1∑

K(t− tib3

)[yi − g1(t)− g2(t)] (2.30)

and the residuals

e(t) = yi − g1(t)− g2(t)− g3(t). (2.31)

©2004 CRC Press LLC

Page 55: Statistics in Musicology

t

0 5 10 15 20 25 30

-0

.4

ARGERICH

t

0 5 10 15 20 25 30-0

.4

ARRAU

t

0 5 10 15 20 25 30

-0

.4

ASKENAZE

t

0 5 10 15 20 25 30

-0.4

BRENDEL

t

0 5 10 15 20 25 30

-0.6

BUNIN

t

0 5 10 15 20 25 30

-0

.6

CAPOVA

t

0 5 10 15 20 25 30

-0

.6

CORTOT1

t

0 5 10 15 20 25 30

-0

.6

CORTOT2

t

0 5 10 15 20 25 30

-0.6

CORTOT3

t

0 5 10 15 20 25 30

-0.6

CURZON

t

0 5 10 15 20 25 30

-0.6

DAVIES

t

0 5 10 15 20 25 30

-0

.6

DEMUS

t

0 5 10 15 20 25 30

-0

.6

ESCHENBACH

t

0 5 10 15 20 25 30

-0

.6

GIANOLI

t

0 5 10 15 20 25 30

-0.6

HOROWITZ1

t

0 5 10 15 20 25 30

-0.6

HOROWITZ2

t

0 5 10 15 20 25 30

-0.6

HOROWITZ3

t

0 5 10 15 20 25 30

-0

.6

KATSARIS

t

0 5 10 15 20 25 30

-0

.6

KLIEN

t

0 5 10 15 20 25 30

-0

.6

KRUST

t

0 5 10 15 20 25 30

-0.6

KUBALEK

t

0 5 10 15 20 25 30

-0.6

MOISEIWITSCH

t

0 5 10 15 20 25 30

-0.6

NEY

t

0 5 10 15 20 25 30

-0

.6

NOVAES

t

0 5 10 15 20 25 30

-0

.6

ORTIZ

t

0 5 10 15 20 25 30

-0

.6

SCHNABEL

t

0 5 10 15 20 25 30

-0.6

SHELLEY

t

0 5 10 15 20 25 30

-0.6

ZAK

Figure 2.13 Smoothed tempo curves g1(t) = (nb1)−1∑

K( t−tib1

)yi (b1 = 8).

©2004 CRC Press LLC

Page 56: Statistics in Musicology

t

0 5 10 15 20 25 30

-0

.5

ARGERICH

t

0 5 10 15 20 25 30-1

.5

ARRAU

t

0 5 10 15 20 25 30

-1

.5

ASKENAZE

t

0 5 10 15 20 25 30

-1.5

BRENDEL

t

0 5 10 15 20 25 30

-1.5

BUNIN

t

0 5 10 15 20 25 30

-1

.5

CAPOVA

t

0 5 10 15 20 25 30

-1

.5

CORTOT1

t

0 5 10 15 20 25 30

-1

.5

CORTOT2

t

0 5 10 15 20 25 30

-1.5

CORTOT3

t

0 5 10 15 20 25 30

-1.5

CURZON

t

0 5 10 15 20 25 30

-1.5

DAVIES

t

0 5 10 15 20 25 30

-1

.5

DEMUS

t

0 5 10 15 20 25 30

-1

.5

ESCHENBACH

t

0 5 10 15 20 25 30

-1

.5

GIANOLI

t

0 5 10 15 20 25 30

-1.5

HOROWITZ1

t

0 5 10 15 20 25 30

-1.5

HOROWITZ2

t

0 5 10 15 20 25 30

-1.5

HOROWITZ3

t

0 5 10 15 20 25 30

-1

.5

KATSARIS

t

0 5 10 15 20 25 30

-1

.5

KLIEN

t

0 5 10 15 20 25 30

-1

.5

KRUST

t

0 5 10 15 20 25 30

-1.5

KUBALEK

t

0 5 10 15 20 25 30

-1.5

MOISEIWITSCH

t

0 5 10 15 20 25 30

-1.5

NEY

t

0 5 10 15 20 25 30

-2

.0

1.0 NOVAES

t

0 5 10 15 20 25 30

-2

.0

ORTIZ

t

0 5 10 15 20 25 30

-2

.0

SCHNABEL

t

0 5 10 15 20 25 30

-2.0

SHELLEY

t

0 5 10 15 20 25 30

-2.0

ZAK

Figure 2.14 Smoothed tempo curves g2(t) = (nb2)−1∑K( t−ti

b2)[yi − g1(t)] (b2 =

1).

©2004 CRC Press LLC

Page 57: Statistics in Musicology

t

0 5 10 15 20 25 30

-2

1 ARGERICH

t

0 5 10 15 20 25 30-2

1 ARRAU

t

0 5 10 15 20 25 30

-2

1 ASKENAZE

t

0 5 10 15 20 25 30

-2

1 BRENDEL

t

0 5 10 15 20 25 30

-2

1 BUNIN

t

0 5 10 15 20 25 30

-3

0

CAPOVA

t

0 5 10 15 20 25 30

-3

0

CORTOT1

t

0 5 10 15 20 25 30

-3

0

CORTOT2

t

0 5 10 15 20 25 30

-3

0

CORTOT3

t

0 5 10 15 20 25 30

-3

0

CURZON

t

0 5 10 15 20 25 30

-3

0

DAVIES

t

0 5 10 15 20 25 30

-3

0

DEMUS

t

0 5 10 15 20 25 30

-3

0

ESCHENBACH

t

0 5 10 15 20 25 30

-3

0

GIANOLI

t

0 5 10 15 20 25 30

-3

0

HOROWITZ1

t

0 5 10 15 20 25 30

-3

0HOROWITZ2

t

0 5 10 15 20 25 30

-3

0

HOROWITZ3

t

0 5 10 15 20 25 30

-3

0

KATSARIS

t

0 5 10 15 20 25 30

-3

0

KLIEN

t

0 5 10 15 20 25 30

-3

0

KRUST

t

0 5 10 15 20 25 30

-3

0

KUBALEK

t

0 5 10 15 20 25 30

-3

0

MOISEIWITSCH

t

0 5 10 15 20 25 30

-3

0

NEY

t

0 5 10 15 20 25 30

-3

0

NOVAES

t

0 5 10 15 20 25 30

-3

0

ORTIZ

t

0 5 10 15 20 25 30

-3

0

SCHNABEL

t

0 5 10 15 20 25 30

-3

0

SHELLEY

t

0 5 10 15 20 25 30

-3

1

ZAK

Figure 2.15 Smoothed tempo curves g3(t) = (nb3)−1∑K( t−ti

b3)[yi − g1(t) −

g2(t)] (b3 = 1/8).

©2004 CRC Press LLC

Page 58: Statistics in Musicology

t

0 5 10 15 20 25 30

-1

.0

1.0 ARGERICH

t

0 5 10 15 20 25 30-1

.0

1.0 ARRAU

t

0 5 10 15 20 25 30

-1

.0

ASKENAZE

t

0 5 10 15 20 25 30

-1.0

BRENDEL

t

0 5 10 15 20 25 30

-1.5

BUNIN

t

0 5 10 15 20 25 30

-1

.5

CAPOVA

t

0 5 10 15 20 25 30

-1

.5

CORTOT1

t

0 5 10 15 20 25 30

-1

.5

CORTOT2

t

0 5 10 15 20 25 30

-1.5

CORTOT3

t

0 5 10 15 20 25 30

-1.5

CURZON

t

0 5 10 15 20 25 30

-1.5

DAVIES

t

0 5 10 15 20 25 30

-1

.5

DEMUS

t

0 5 10 15 20 25 30

-1

.5

ESCHENBACH

t

0 5 10 15 20 25 30

-1

.5

GIANOLI

t

0 5 10 15 20 25 30

-1.5

HOROWITZ1

t

0 5 10 15 20 25 30

-1.5

HOROWITZ2

t

0 5 10 15 20 25 30

-1.5

1.5 HOROWITZ3

t

0 5 10 15 20 25 30

-1

.5

1.5 KATSARIS

t

0 5 10 15 20 25 30

-1

.5

1.5 KLIEN

t

0 5 10 15 20 25 30

-1

.5

1.5 KRUST

t

0 5 10 15 20 25 30

-1.5

1.5 KUBALEK

t

0 5 10 15 20 25 30

-1.5

1.5 MOISEIWITSCH

t

0 5 10 15 20 25 30

-1.5

1.5 NEY

t

0 5 10 15 20 25 30

-1

.5

1.5 NOVAES

t

0 5 10 15 20 25 30

-1

.5

1.5 ORTIZ

t

0 5 10 15 20 25 30

-1

.5

SCHNABEL

t

0 5 10 15 20 25 30

-1.5

SHELLEY

t

0 5 10 15 20 25 30

-1.5

ZAK

Figure 2.16 Smoothed tempo curves – residuals e(t) = yi − g1(t) − g2(t) − g3(t).

©2004 CRC Press LLC

Page 59: Statistics in Musicology

The tempo curves are thus decomposed into curves corresponding to a hi-erarchy of bandwidths. Each component reveals specific features. The firstcomponent reflects the overall tendency of the tempo. Most pianists havean essentially monotonically decreasing curve corresponding to a gradual,and towards the end emphasized, ritardando. For some performances (inparticular Bunin, Capova, Gianoli, Horowitz 1, Kubalek, and Moisewitsch)there is a distinct initial acceleration with a local maximum in the middleof the piece. The second component g2(t) reveals tempo-fluctuations thatcorrespond to a natural division of the piece in 8 times 4 bars. Some pi-anists, like Cortot, greatly emphasize the 8×4 structure. For other pianists,such as Horowitz, the 8×4 structure is less evident: the smoothed tempocurve is mostly quite flat, though the main, but smaller, tempo changes dotake place at the junctions of the eight parts. Striking is also the distinctionbetween part B (bars 17 to 24) and the other parts (A,A

′,A

′′) of the com-

position – in particular in Argerich’s performance. The third componentcharacterizes fluctuations at the resolution level of 2/8th. At this very locallevel, tempo changes frequently for pianists like Horowitz, whereas thereis less local movement in Cortot’s performances. Finally, the residuals e(t)consist of the remaining fluctuations at the finest resolution of 1/8th. Thesimilarity between the three residual curves by Horowitz illustrate thateven at this very fine level, the “seismic” variation of tempo is a highlycontrolled process that is far from random.

2.5.4 Tempo curves and melodic indicator

In Chapter 3, the so-called melodic indicator will be introduced. One ofthe aims will be to “explain” some of the variability in tempo curvesby melodic structures in the score. Consider a simple melodic indicatorm(t) = wmelod(t) (see Section 3.3.4) that is essentially obtained by addingall indicators corresponding to individual motifs. Figures 2.17a and d dis-play smoothed curves obtained by local polynomial smoothing of −m(t)using a large and a small bandwidth respectively. Figures 2.17b and e showthe first derivatives of the two curves in 2.17a,d. Similarily, the secondderivatives are given in figures 2.17c and f. For the tempo curves, the firstand second derivatives of local polynomial fits with b = 4 are given in Fig-ures 2.18 and 2.19 respectively. A resemblance can be found in particularbetween the second derivative of −m(t) in Figure 2.17f and the secondderivatives of tempo curves in Figure 2.19. Also, there are interesting simi-larities and differences between the performances, with respect to the localvariability of the first two derivatives. Many pianists start with a very smallsecond derivative, with strongly increased values in part B.

©2004 CRC Press LLC

Page 60: Statistics in Musicology

t

mel

. Ind

.

0 5 10 15 20 25 30

-84

-82

-80

-78

a) -m(t) (span=24/32)

t1s

t der

.

0 5 10 15 20 25 30

-0.5

0.

0 0.

5

b) -m’(t) (span=24/32)

t

2nd

der.

0 5 10 15 20 25 30

-0.4

0.

0 0.

2 0.

4 0.

6

c): -m’’(t) (span=24/32)

t

mel

. Ind

.

0 5 10 15 20 25 30

-100

-8

0 -6

0 -4

0

d) -m(t) (span=8/32)

t

1st d

er.

0 5 10 15 20 25 30

-40

-20

0 20

40

e) -m’(t) (span=8/32)

t2n

d de

r.0 5 10 15 20 25 30

-100

-5

0 0

50

100

150

f) -m’’(t) (span=8/32)

Figure 2.17 Melodic indicator – local polynomial fits together with first and secondderivatives.

2.5.5 Tempo and loudness

By invitation of Prince Charles, Vladimir Horowitz gave a benefit recitalat London’s Royal Festival Hall on May 22, 1982. It was his first Europeanappearance in 31 years. One of the pieces played at the concert was Schu-mann’s Kinderszene op. 15, No. 4. Figure 2.20 displays the (approximate)soundwave of Horowitz’s performance sampled from the CD recording. Twovariables that can be extracted quite easily by visual inspection are: a) onthe horizontal axis the time when notes are played (and derived from thisquantity, the tempo) and b) on the vertical axis, loudness. More specifically,let t1, ..., tn be the score onset-times and u(t1), ..., u(tn) the correspondingperformance times. Then an approximate tempo at score-onset time ti canbe defined by y(ti) = (ti+1 − ti)/(u(ti+1) − u(ti)). A complication withloudness is that the amplitude level of piano sounds decreases gradually ina complex manner so that “loudness” as such is not defined exactly. Forsimplicity, we therefore define loudness as the initial amplitude level (orrather its logarithm). Moreover, we consider only events where the score-onset time is a multiple of 1/8. For illustration, the first four events (scoreonset times 1/8, 2/8, 3/8, 4/8) are marked with arrows in Figure 2.20.

An interesting question is what kind of relationship there may be be-tween time delay y and loudness level x. The autocorrelations of x(ti) =

©2004 CRC Press LLC

Page 61: Statistics in Musicology

t

1st d

er.

0 5 10 15 20 25 30

-1

.5

-0

.5

0

.5

1.0

ARGERICH

t

1st d

er.

0 5 10 15 20 25 30

-1

.5

-0

.5

0

.5

1.0

ARRAU

t

1st d

er.

0 5 10 15 20 25 30

-1

.5

-0

.5

0

.5

1

.0

ASKENAZE

t1st der.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

BRENDEL

t

1st d

er.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

BUNIN

t

1st d

er.

0 5 10 15 20 25 30

-1

.5

-0

.5

0.5

1.0

CAPOVA

t

1st der.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

CORTOT1

t

1st d

er.

0 5 10 15 20 25 30

-1.5

-0

.5

0.5

1.0

CORTOT2

t

1st der.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

CORTOT3

t

1st der.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

CURZON

t

1st der.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

DAVIES

t

1st der.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

DEMUS

t

1st der.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

ESCHENBACH

t

1st der.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

GIANOLI

t

1st d

er.

0 5 10 15 20 25 30

-1

.5

-0

.5

0

.5

1.0

HOROWITZ1

t

1st der.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

HOROWITZ2

t

1st d

er.

0 5 10 15 20 25 30

-1

.5

-0

.5

0

.5

1

.0

HOROWITZ3

t

1st der.

0 5 10 15 20 25 30

-1

.5

-0

.5

0

.5

1

.0

KATSARIS

t

1st d

er.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

KLIEN

t

1st der.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

KRUST

t

1st der.

0 5 10 15 20 25 30

-1

.5

-0

.5

0

.5

1

.0

KUBALEK

t

1st d

er.

0 5 10 15 20 25 30

-1

.5

-0

.5

0

.5

1

.0

MOISEIWITSCH

t

1st d

er.

0 5 10 15 20 25 30

-1

.5

-0

.5

0

.5

1

.0

NEY

t

1st d

er.

0 5 10 15 20 25 30

-1

.5

-0

.5

0.5

1.0

NOVAES

t

1st d

er.

0 5 10 15 20 25 30

-1

.5

-0

.5

0.5

1.0

ORTIZ

t

1st d

er.

0 5 10 15 20 25 30

-1.5

-0.5

0.5

1.0

SCHNABEL

t

1st der.

0 5 10 15 20 25 30

-1.5

-0

.5

0.5

1.0

SHELLEY

t

1st d

er.

0 5 10 15 20 25 30

-1

.5

-0

.5

0.5

1.0

ZAK

Figure 2.18 Tempo curves (Figure 2.3) – first derivatives obtained from localpolynomial fits (span 24/32).

©2004 CRC Press LLC

Page 62: Statistics in Musicology

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

ARGERICH

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

ARRAU

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

ASKENAZE

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

BRENDEL

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

BUNIN

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

CAPOVA

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

CORTOT1

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

CORTOT2

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

CORTOT3

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

CURZON

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

DAVIES

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

DEMUS

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

ESCHENBACH

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-1

0

1

2

3

GIANOLI

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-

10

12

3

HOROWITZ1

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-

10

12

3

HOROWITZ2

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-

10

12

3

HOROWITZ3

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-

10

12

3

KATSARIS

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-

10

12

3

KLIEN

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-

10

12

3

KRUST

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-

10

12

3KUBALEK

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-

10

12

3

MOISEIWITSCH

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-1

01

23

NEY

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-1

01

23

NOVAES

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-1

01

23

ORTIZ

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-1

01

23

SCHNABEL

t

2nd der.

0 5 10 15 20 25 30

-3

-2

-1

01

23

SHELLEY

t

2n

d d

er.

0 5 10 15 20 25 30

-3

-2

-1

01

23

ZAK

Figure 2.19 Tempo curves (Figure 2.3) – second derivatives obtained from localpolynomial fits (span 8/32).

©2004 CRC Press LLC

Page 63: Statistics in Musicology

Figure 2.20 Kinderszene No. 4 – sound wave of performance by Horowitz at theRoyal Festival Hall in London on May 22, 1982.

log(Amplitude) and y(ti) as well as the cross-autocorrelations between thetwo time series are shown in Figure 2.21a. The main remarkable cross-autocorrelation occurs at lag 8. This can also be seen visually when plot-ting y(ti+8) against x(ti) (Figure 2.21b). There appears to be a strongrelationship between the two variables with the exception of four outliers.The three fitted lines correspond to a) a least square linear regression fitusing all data; b) a robust high breakdown point and high efficiency regres-sion (Yohai et al. 1991); and c) a least squares fit excluding the outliers. Itshould be noted that the “outliers” all occur together in a temporal cluster(see Figure 2.21c) and correspond to a phase where tempo is at its extreme(lowest for the first three outliers and fastest for the last outlier). Thisindicates that these are informative “outliers” (in contrast to wrong mea-surements) that should not be dismissed, since they may tell us somethingabout the intention of the performer.

Finally, Figure 2.21d displays a sharpened version of the scatterplot inFigure 2.21b: Points with high estimated joint density f(x, y) are markedwith “O”. In contrast to what one would expect from a regression model,random errors εi that are independent of x, the points with highest densitygather around a horizontal line rather than the regression line(s) fitted inFigure 2.21b. Thus, a linear regression model is hardly applicable. Instead,the data may possibly be divided into three clusters: a) a cluster with lowloudness and low tempo; b) a second cluster with medium loudness andlow to medium tempo; and c) a third cluster with a high level of loudnessand medium to high tempo.

©2004 CRC Press LLC

Page 64: Statistics in Musicology

Figure 2.21 log(Amplitude) and tempo for Kinderszene No. 4 – auto- and crosscorrelations (Figure 2.24a), scatter plot with fitted least squares and robust lines(Figure 2.24b), time series plots (Figure 2.24c), and sharpened scatter plot (Fig-ure 2.24d).

©2004 CRC Press LLC

Page 65: Statistics in Musicology

2.5.6 Loudness and tempo – two-dimensional distribution function

In the example above, the correlation between loudness and tempo, whenmeasured at the same time, turned out to be relatively small, whereas thereappeared to be quite a clear lagged relationship. Does this mean that thereis indeed no “immediate” relationship between these two variables? Con-sider x(ti) = log(Amplitude) and the logarithm of tempo. The scatterplotand the boxplot in Figures 22a and b rather suggest that there may be arelationship, but the dependence is nonlinear. This is further supported bythe two-dimensional histogram (Figure 23a), the smoothed density (Figure24a) and the corresponding image plots (Figures 23b and 24b; the actualobservations are plotted as stars). The density was estimated by a kernelestimate with the Epanechnikov kernel. Since correlation only measures lin-ear dependence, it cannot detect this kind of highly nonlinear relationship.

Figure 2.22 Horowitz’ performance of Kinderszene No. 4 – log(tempo) versuslog(Amplitude) and boxplots of log(tempo) for three ranges of amplitude.

2.5.7 Melodic tempo-sharpening

Sharpening can also be applied by using an “external” variable. This isillustrated in Figures 2.25 through 2.27. Figure 2.25a displays the estimateddensity function of log(m+1) wherem(t) is the value of a melodic indicatorat onset time t. The marked region corresponds to very high values ofthe density function f (namely f(x) > 0.793). This defines a set Isharpof corresponding “sharpening onset times”. The series m(t) is shown inFigure 2.25b, with sharpening onset times t ∈ Isharp highlighted by vertical

©2004 CRC Press LLC

Page 66: Statistics in Musicology

Figure 2.23 Horowitz’ performance of Kinderszene No. 4 – two-dimensional his-togram of (x, y) = (log(tempo), log(Amplitude)) displayed in a perspective andimage plot respectively.

Figure 2.24 Horowitz’ performance of Kinderszene No. 4 – kernel estimate oftwo-dimensional distribution of (x, y) = (log(tempo), log(Amplitude)) displayedin a perspective and image plot respectively.

©2004 CRC Press LLC

Page 67: Statistics in Musicology

Figure 2.25 R. Schumann, Traumerei op. 15, No. 7 – density of melodic indicatorwith sharpening region (a) and melodic curve plotted against onset time, withsharpening points highlighted (b).

©2004 CRC Press LLC

Page 68: Statistics in Musicology

tem

po

0

CORTOT1

tem

po

0

CORTOT2

tem

po

0

CORTOT3te

mp

o

0

HOROWITZ1te

mp

o

0HOROWITZ2

tem

po

0

HOROWITZ3

Figure 2.26 R. Schumann, Traumerei op. 15, No. 7 – tempo by Cortot andHorowitz at sharpening onset times.

diff

(tem

po)

-10

010

CORTOT1

diff

(tem

po)

-10

010

CORTOT2

diff

(tem

po)

-10

01

0

CORTOT3

diff

(te

mp

o)

-10

01

0

HOROWITZ1

diff

(te

mp

o)

-10

01

0

HOROWITZ2

diff

(tem

po)

-10

010

HOROWITZ3

Figure 2.27 R. Schumann, Traumerei op. 15, No. 7 – tempo “derivatives” forCortot and Horowitz at sharpening onset times.

©2004 CRC Press LLC

Page 69: Statistics in Musicology

lines. Figures 2.26 and 2.27 show the tempo y and its discrete “derivative”v(ti) = [y(ti+1)− y(ti)]/(ti+1 − ti) for ti ∈ Isharp and the performances byCortot and Horowitz. The pictures indicate a systematic difference betweenCortot and Horowitz. A common feature is the negative derivative at thefifth and sixth sharpening onset time.

2. 6 Some mul ti vari ate descri pti ve di spl ays

2.6.1 Definitions

Suppose that we observe multivariate data x1, x2, ..., xn where each xi isa p-dimensional vector (xi1, ..., xip)t ∈ Rp. Obvious numerical summarystatistics are the sample mean

x = (x1, x2, ..., xp)t

where xj = n−1∑ni=1 xij and the p× p covariance matrix S with elements

Sjl = (n− 1)−1n∑i=1

(xij − xj)(xil − xl).

Most methods for analyzing multivariate data are based on these two statis-tics. One of the main tools consists of dimension reduction by suitable pro-jections, since it is easier to find and visualize structure in low dimensions.These techniques go far beyond descriptive statistics. We therefore post-pone the discussion of these methods to Chapters 8 to 11. Another set ofmethods consists of visualizing individual multivariate observations. Themain purpose is a simple visual identification of similarities and differencesbetween observations, as well as search for clusters and other patterns.Typical examples are:

• Faces: xi=(xi1, ..., xip)t is represented by a face with features dependingon the values of corresponding coordinates. For instance, the face func-tion in S-Plus has the following correspondence between coordinates andfeature parameters: xi,1 =area of face; xi,2 = shape of face; xi,3 = lengthof nose; xi,4 = location of mouth; xi,5 = curve of smile; xi,6 = width ofmouth; xi,7 = location of eyes; xi,8 = separation of eyes; xi,9 = angleof eyes; xi,10 = shape of eyes; xi,11 = width of eyes; xi,12 = location ofpupil; xi,13 = location of eyebrow; x14 = angle of eyebrow; xi,15 = widthof eyebrows.

• Stars: Each coordinate is represented by a ray in a star, the length ofeach corresponding to the value of the coordinate. More specifically, astar for a data vector xi = (xi1, ..., xip)t is constructed as follows:

1. Scale xi to the range [0, r] : 0 ≤ x1j,..., xnj ≤ r;2. Draw p rays at angles ϕj = 2π(j − 1)/p (j = 1, ..., p); for a star with

©2004 CRC Press LLC

Page 70: Statistics in Musicology

origin 0 representing observation xi, the end point of the jth ray hasthe coordinates r · (xij cosϕj , xij sinϕj);

3. For visual reasons, the end points of the rays may be connected bystraight lines.

• Pro fi l e s : An observation xi=(xi1, ..., xip)t is represented by a plot ofxij versus j where neighboring points xij−1 and xij (j = 1, ..., p) areconnected.

• Symb o l pl o t: The horizontal and vertical positions represent xi1 andxi2 respectively (or any other two coordinates of xi). The other coor-dinates xi3, ..., xip determine p− 2 characteristic shape parameters of ageometric object that is plotted at point (xi1, xi2). Typical symbols arecircle (one additional dimension), rectangle (two additional dimensions),stars (arbitrary number of additional dimensions), and faces (arbitrarynumber of additional dimensions).

2.7 Sp ecific applications in music – multivariate

2.7.1 Distribution of notes – Chernoff faces

In music that is based on scales, pitch (modulo 12) is usually not equallydistributed. Notes that belong to the main scale are more likely to occur,and within these, there are certain prefered notes as well (e.g. the rootsof the tonic, subtonic and supertonic triads). To illustrate this, we con-sider the following compositions: 1. Saltarello (Anonymus, 13th century);2. Prelude and Fugue No. 1 from “Das Wohltemperierte Klavier” (J. S.Bach, 1685-1750); 3. Kinderszene op. 15, No. 1 (R. Schumann, 1810-1856);4. Piano piece op. 19, No. 2 (A. Schonberg, 1874-1951; figure 2.28); 5. RainTree Sketch 1 (T. Takemitsu, 1930-1996). For each composition, the dis-tribution of notes (pitches) modulo 12 is calculated and centered aroundthe “central pitch” (defined as the most frequent pitch modulo 12). Thus,the central pitch is defined as zero. We then obtain five vectors of relativefrequencies pj = (pj0, ..., pj11)t (j = 1, ..., 5) characterizing the five compo-sitions. In addition, for each of these vectors the number nj of local peaksin pj is calculated. We say that a local peak at i ∈ {1, ..., 10} occurs, ifpji > max(pji−1, pji+1). For i = 10, we say that a local peak occurs, ifpji > pji−1. Figure 2.29a displays Chernoff faces of the 12-dimensional vec-tors vj = (nj , pj1, ..., pj11)t. In Figure 2.29b, the coordinates of vj (and thusthe assignment of feature variables) were permuted. The two plots illus-trate the usefulness of Chernoff faces, and at the same time the difficultiesin finding an objective interpretation. On one hand, the method discoversa plausible division in two groups: both picures show a clear distinctionbetween classical tonal music (first three faces) and the three representa-tives of “avant-garde” music of the 20th century. On the other hand, the

©2004 CRC Press LLC

Page 71: Statistics in Musicology

exact nature of the distinction cannot be seen. In Figure 2.29a, the classicalfaces look much more friendly than the rather miserable avant-garde fel-lows. The judgment of conservative music lovers that “avant-garde” musicis unbearable, depressing, or even bad for health, seems to be confirmed!Yet, bad temper is the response of the classical masters to a simple permu-tation of the variables (Figure 2.29b), whereas the grim avant-garde seemsto be much more at ease. The difficulty in interpreting Chernoff faces isthat the result depends on the order of the variables, whereas due to theirpsychological effect most feature variables are not interchangeable.

Figure 2.28 Arnold Schonberg (1874-1951), self-portrait. (Courtesy of Verwer-tungsgesellschaft Bild-Kunst, Bonn.)

2.7.2 Distribution of notes – star plots

We consider once more the distribution vectors pj = (pj0, ..., pj11)t of pitchmodulo 12 where 0 is the tonal center. In contrast to Chernoff faces, per-mutation of coordinates in star plots is much less likely to have a subjectiveinfluence on the interpretation of the picture. Nevertheless, certain patternscan become more visible when using an appropriate ordering of the vari-ables. From the point of view of tonal music, a natural ordering of pitch canbe obtained, for instance, from the ascending circle of fourths. This leadsto the following permutation p∗

j = (p5, p10, p3, p8, p1, p6, p11, p4, p9, p2, p7)t.(p0 is omitted, since it is maximal by definition for all compositions.) Sincestars are easy to look at, it is possible to compare a large number of obser-vations simultaneously. We consider the following set of compositions:

©2004 CRC Press LLC

Page 72: Statistics in Musicology

a

ANONYMUS BACH SCHUMANN

WEBERN SCHOENBERG TAKEMITSU

Figure 2.29 a) Chernoff faces for 1. Saltarello (Anonymus, 13th century); 2.Prelude and Fugue No. 1 from “Das Wohltemperierte Klavier” (J. S. Bach, 1685-1750); 3. Kinderszene op. 15, No. 1 (R. Schumann, 1810-1856); 4. Piano pieceop. 19, No. 2 (A. Schonberg, 1874-1951); 5. Rain Tree Sketch 1 (T. Takemitsu,1930-1996).

b

ANONYMUS BACH SCHUMANN

WEBERN SCHOENBERG TAKEMITSU

Figure 2.29 b) Chernoff faces for the same compositions as in figure 2.29a, afterpermuting coordinates.

©2004 CRC Press LLC

Page 73: Statistics in Musicology

• A. de la Halle (1235?-1287): “Or est Bayard en la pature, hure!”;

• J. de Ockeghem (1425-1495): Canon epidiatesseron;

• J. Arcadelt (1505-1568): a) Ave Maria, b) La ingratitud, c) Io dico franoi;

• W. Byrd (1543-1623): a) Ave Verum Corpus, b) Alman, c) The Queen’sAlman;

• J.P. Rameau (1683-1764): a) La Popliniere, b) Le Tambourin, c) LaTriomphante;

• J.S. Bach (1685-1750): Das Wohltemperierte Klavier – Preludes undFuges No. 5, 6 and 7;

• D. Scarlatti (1660-1725): Sonatas K 222, K 345 and K 381;

• J. Haydn (1732-1809): Sonata op. 34, No. 2;

• W.A. Mozart (1756-1791): 2nd movements of Sonatas KV 332, KV 545and KV 333;

• M. Clementi (1752-1832): Gradus ad Parnassum – Studies 2 and 9 (Fig-ure 11.4);

• R. Schumann (1810-1856): Kinderszenen op. 15, No. 1, 2, and 3;

• F. Chopin (1810-1849): a) Nocturne op. 9, No. 2, b) Nocturne op. 32,No. 1, c) Etude op. 10, No. 6;

• R. Wagner (1813-1883): a) Bridal Choir from “Lohengrin”, b) Ouvertureto Act 3 of “Die Meistersinger”;

• C. Debussy (1862-1918): a) Claire de lune, b) Arabesque No. 1, c) Re-flections dans l’eau;

• A. Scriabin (1872-1915): Preludes op. 2/2, op. 11/14 and op. 13/2;

• B. Bartok (1881-1945): a) Bagatelle op. 11, No. 2 and 3, b) Sonata forPiano;

• O. Messiaen (1908-1992): Vingts regards sur l’enfant de Jesus, No. 3;

• S. Prokoffieff (1891-1953): Visions fugitives No. 11, 12 and 13;

• A. Schonberg (1874-1951): Piano piece op. 19, No. 2;

• T. Takemitsu (1930-1996): Rain Tree Sketch No. 1;

• A. Webern (1883-1945): Orchesterstuck op. 6, No. 6;

• J. Beran (*1959): Santi – piano concert No. 2 (beginning of 2nd Mov.)The star plots of p∗

j are given in Figure 2.31. From Halle (cf. Figure 2.30)up to about the early Scriabin, the long beams form more or less a half-circle. This means that the most frequent notes are neighbors in the circleof quarts and are much more frequent than all other notes. This is indeedwhat one would expect in music composed in the tonal system. The picturestarts changing in the neighborhood of Scriabin where long beams are either

©2004 CRC Press LLC

Page 74: Statistics in Musicology

isolated (most extremely for Bartok’s Bagatelle No. 3) or tend to cover moreor less the whole range of notes (e.g. Bartok, Prokoffieff, Takemitsu, Beran).Due to the variety of styles in the 20th century, the specific shape of eachof the stars would need to be discussed in detail individually. For instance,Messiaen’s shape may be explained by the specific scales (Messiaen scales)he used. Generally speaking, the difference between star plots of the 20thcentury and earlier music reflects the replacement of the traditional tonalsystem with major/minor scales by other principles.

Figure 2.30 The minnesinger Burchard von Wengen (1229-1280), contemporaryof Adam de la Halle (1235?-1288). (From Codex Manesse, courtesy of the Uni-versity Library Heidelberg.) (Color figures follow page 152.)

©2004 CRC Press LLC

Page 75: Statistics in Musicology

HALLE OCKEGHEM ARCADELT ARCADELT ARCADELT BYRD BYRD BYRD

RAMEAU RAMEAU RAMEAU BACH BACH BACH SCARLATTI SCARLATTI

SCARLATTI HAYDN MOZART MOZART MOZART CLEMENTI CLEMENTI SCHUMANN

SCHUMANN SCHUMANN CHOPIN CHOPIN CHOPIN WAGNER WAGNER DEBUSSY

DEBUSSY DEBUSSY SCRIABIN SCRIABIN SCRIABIN BARTOK BARTOK BARTOK

PROKOFFIEFF PROKOFFIEFF PROKOFFIEFF MESSIAEN SCHOENBERG WEBERN TAKEMITSU BERAN

Distribution of notes ordered according to ascending circle of fourths

Figure 2.31 Star plots of p∗j = (p6, p11, p4, p9, p2, p7, p12, p5, p10, p3, p8)

t for com-positions from the 13th to the 20th century.

2.7.3 Joint distribution of interval steps of envelopes

Consider a composition consisting of onset times ti and pitch values x(ti). Ina polyphonic score, several notes may be played simultaneously. To simplifyanalysis, we define a simplified score by considering the lower and upperenvelope:Definition 24 Let

C = {(ti, x(ti)) : ti ∈ A, x(ti) ∈ B, i = 1, 2, ..., N} =n⋃j=1

Cj

where A = {t∗1, ..., t∗n} ⊂ Z+ (t∗1 < t∗2 < ... < t∗n), B ⊂ R or Z andCj = {(t, x(t)) ∈ C : t = t∗j}. Then the lower and upper envelope of C are

©2004 CRC Press LLC

Page 76: Statistics in Musicology

defined byElow = {(t∗j , min

(t,x(t))∈Cj

x(t)), j = 1, ..., n}and

Eup = {(t∗j , max(t,x(t))∈Cj

x(t)), j = 1, ..., n}.

In other words, for each onset time, the lowest and highest note are se-lected to define the lower and upper envelope respectively. In the examplebelow, we consider interval steps ∆y(ti) = y(ti+1) − y(ti) mod 12 for theupper envelope of a composition with onset times t1, ..., tn and pitchesy(t1)..., y(tn). A simple aspect of melodic and harmonic structure is thequestion in which sequence intervals are likely to occur. Here, we look atthe empirical two-dimensional distribution of (∆y(ti),∆y(ti+1)). For eachpair (i, j), (−11 ≤ i, j ≤ 11, i, j =0), we count the number nij of occurencesand define Nij = log(nij + 1). (The value 0 is excluded here, since repe-titions of a note – or transposition by an octave – are less interesting.) Ifonly the type of interval and not its direction is of interest, then i, j assumethe values 1 to 11 only. A useful representation of Nij can be obtained bya symbol plot. In Figures 2.32 and 2.33, the x- and y-coordinates corre-spond to i and j respectively. The radius of a circle with center (i, j) isproportional to Nij . The compositions considered here are: a) J.S. Bach:Praludium No. 1 from ”Das Wohltemperierte Klavier”; b) W.A. Mozart :Sonata KV 545, (beginning of 2nd Movement); c) A. Scriabin: Prelude op.51, No. 4; and d) F. Martin: Prelude No. 6. For Bach’s piece, there is a clearclustering in three main groups in the first plot (there are almost never twosuccessive interval steps downwards) and a horseshoe-like pattern for abso-lute intervals. Remarkable is the clear negative correlation in Mozart’s firstplot and the concentration on a few selected interval sequences. A nega-tive correlation in the plots of interval steps with sign can also be foundfor Scriabin and Martin. However, considering only the types of intervalswithout their sign, the number and variety of interval sequences that areused relatively frequently is much higher for Scriabin and even more forMartin. For Martin, the plane of absolute intervals (Figure 2.33d) is filledalmost uniformly.

2.7.4 Pitch distribution – symbol plots with circles

Consider once more the distribution vectors pj = (pj0, ..., pj11)t of pitchmodulo 12 as in the star-plot example above. The star plots show a cleardistinction between “modern” compositions and classical tonal composi-tions. Symbol plots can be used to see more clearly which composers (orcompositions) are close with respect to pj . In figure 2.34 the x- and y-axis corresponds to pj5 and pj7. Recall that if 0 is the root of the tonictriad, then 5 is the root of the subtonic and 7 the root of the dominant

©2004 CRC Press LLC

Page 77: Statistics in Musicology

Figure 2.32 Symbol plot of the distribution of successive interval pairs(∆y(ti), ∆y(ti+1)) (a, c) and their absolute values (b, d) respectively, for theupper envelopes of Bach’s Praludium No. 1 (Das Wohltemperierte Klavier I) andMozart ’s Sonata KV 545 (beginning of 2nd movement).

©2004 CRC Press LLC

Page 78: Statistics in Musicology

Figure 2.33 Symbol plot of the distribution of successive interval pairs(∆y(ti), ∆y(ti+1)) (a, c) and their absolute values (b, d) respectively, for theupper envelopes of Scriabin’s Prelude op. 51, No. 4 and F. Martin’s Prelude No.6.

©2004 CRC Press LLC

Page 79: Statistics in Musicology

triad. The radius of the circles in Figure 2.34 is proportional to pj1, thefrequency of the “dissonant” minor second. In color Figure 2.35, the radiusrepresents pj6, i.e. the augmented fourth. Both plots show a clear positiverelationship between pj5 and pj7. Moreover the circles tend to be largerfor small values of x and y. The positioning in the plane together with thesize of the circles separates (apart from a few exceptions) classical tonalcompositions from more recent ones. To visualize this, four different col-ors are chosen for “early music” (black), “baroque and classical” (green),“romantic” (blue) and “20/21st century” (red). The clustering of the fourcolors indicates that there is indeed an approximate clustering accordingto the four time periods. Interesting exceptions can be observed for “early”music with two extreme “outliers” (Halle and Arcadelt). Also, one piece byRameau is somewhat far from the rest.

0.0 0.05 0.10 0.15 0.20

0.0

0.0

5

0.1

0

0.1

5

0.2

0

HALLE

OCKEGHEM

ARCADELT

ARCADELT

ARCADELT

BYRDBYRD

BYRDRAMEAU

RAMEAURAMEAU

BACH BACH

BACH

SCARLATTI

SCARLATTI

SCARLATTI

HAYDN

MOZART

MOZART

MOZART

CLEMENTI

CLEMENTI

SCHUMANN

SCHUMANN

SCHUMANNCHOPINCHOPIN

CHOPIN

WAGNER

WAGNER

DEBUSSY

DEBUSSY

DEBUSSYSCRIABIN

SCRIABIN

SCRIABIN

BARTOK

BARTOK

BARTOKPROKOFFIEFF

PROKOFFIEFF

PROKOFFIEFF

MESSIAENSCHOENBERG

WEBERN

TAKEMITSU

BERAN

Figure 2.34 Symbol plot with x = pj 5 , y = pj 7 and radius of circles proportionalto pj 1 .

2.7.5 Pitch distribution – symbol plots with rectangles

By using rectangles, four dimensions can be represented. Color Figure 2.36shows a symbol with (x, y)-coordinates (pj5, pj7) and rectangles with width

©2004 CRC Press LLC

Page 80: Statistics in Musicology

0.0 0.05 0.10 0.15 0.20

0.0

0

.05

0

.10

0

.15

0

.20

HALLE

OCKEGHEM

ARCADELT

ARCADELT

ARCADELT

BYRDBYRD

BYRDRAMEAU

RAMEAURAMEAU

BACH BACH

BACH

SCARLATTI

SCARLATTI

SCARLATTI

HAYDN

MOZART

MOZART

MOZART

CLEMENTI

CLEMENTI

SCHUMANN

SCHUMANN

SCHUMANNCHOPINCHOPIN

CHOPIN

WAGNER

WAGNER

DEBUSSY

DEBUSSY

DEBUSSYSCRIABIN

SCRIABIN

SCRIABIN

BARTOK

BARTOK

BARTOKPROKOFFIEFF

PROKOFFIEFF

PROKOFFIEFF

MESSIAENSCHOENBERG

WEBERN

TAKEMITSU

BERAN

Figure 2.35 Symbol plot with x = pj 5 , y = pj 7 and radius of circles proportionalto pj 6 . (Color figures follow page 152.)

pj1 (diminished second) and height pj6 (augmented fourth). Using the samecolors for the names as above, a similar clustering as in the circle-plot canbe observed. The picture not only visualizes a clear four-dimensional rela-tionship between pj1, pj5, pj6 and pj7, but also shows that these quantitiesare related to the time period.

2.7.6 Pitch distribution – symbol plots with stars

Five dimensions are visualized in color Figure 2.37 with (x, y) = (pj5, pj7)and the variables pj1, pj6 and pj10 (diminished seventh) defining a starplotfor each observation, the first variables starting on the right and the sub-sequent variables winding counterclockwise around the star (in this casea triangle). The shape of the triangle is obviously a characteristic of thetime period. For tonal music composed mostly before about 1900, the starsare very narrow with a relatively long beam in the direction of the di-minished seventh. The diminished seventh is indeed an important pitchin tonal music, since it is the fourth note in the dominant seventh chordto the subtonic. In contrast, notes that are a diminished second and an

©2004 CRC Press LLC

Page 81: Statistics in Musicology

0.0 0.05 0.10 0.15 0.20

-0

.10

.00

.10

.2

HALLE

OCKEGHEM

ARCADELT

ARCADELT

ARCADELT

BYRD BYRDBYRD

RAMEAU

RAMEAURAMEAU

BACH BACHBACH

SCARLATTI

SCARLATTI

SCARLATTI

HAYDN

MOZART

MOZART

MOZART

CLEMENTI

CLEMENTI

SCHUMANN

SCHUMANN

SCHUMANNCHOPINCHOPIN

CHOPIN

WAGNER

WAGNER

DEBUSSY

DEBUSSY

DEBUSSYSCRIABIN

SCRIABIN

SCRIABIN

BARTOK

BARTOK

BARTOKPROKOFFIEFF

PROKOFFIEFF

PROKOFFIEFF

MESSIAENSCHOENBERG

WEBERN

TAKEMITSU

BERAN

Figure 2.36 Symbol plot with x = pj5, y = pj7. The rectangles have width pj1

(diminished second) and height pj 6 (augmented fourth). (Color figures follow page152.)

©2004 CRC Press LLC

Page 82: Statistics in Musicology

augmented fourth above the root of the tonic triad build, together withthe tonic root, highly dissonant intervals and are therefore less frequent intonal music. Color Figure 2.37 shows the triangles; the names without thetriangles are plotted in color Figure 2.38.

2.7.7 Pitch distribution – profile plots

Finally, as an alternative to star plots, Figure 2.39 displays profile plotsof p∗

j = (p5, p10, p3, p8, p1, p6, p11, p4, p9, p2, p7)t. For compositions up toabout 1900, the profiles are essentially U-shaped. This corresponds to starswith clustered long and short beams respectively, as seen previously. For“modern” compositions, there is a large variety of shapes different from aU-shape.

©2004 CRC Press LLC

Page 83: Statistics in Musicology

0.0 0.05 0.10 0.15 0.20

0.0

0.0

50

.10

0.1

50

.20

Figure 2.37 Symbol plot with x = pj5, y = pj7, and triangles defined by pj1 (di-minished second), pj 6 (augmented fourth) and pj 10 (diminished seventh). (Colorfigures follow page 152.)

©2004 CRC Press LLC

Page 84: Statistics in Musicology

0.0 0.05 0.10 0.15 0.20

0.0

0

.05 0.1

0 0.1

5 0.2

0

HALLE

OCKEGHEM

ARCADELT

ARCADELT

ARCADELT

BYRDBYRD

BYRDRAMEAU

RAMEAURAMEAU

BACH BACH

BACH

SCARLATTI

SCARLATTI

SCARLATTI

HAYDN

MOZART

MOZART

MOZART

CLEMENTI

CLEMENTI

SCHUMANN

SCHUMANN

SCHUMANNCHOPINCHOPIN

CHOPIN

WAGNER

WAGNER

DEBUSSY

DEBUSSY

DEBUSSYSCRIABIN

SCRIABIN

SCRIABIN

BARTOK

BARTOK

BARTOKPROKOFFIEFF

PROKOFFIEFF

PROKOFFIEFF

MESSIAENSCHOENBERG

WEBERN

TAKEMITSU

BERAN

Figure 2.38 Names plotted at locations (x, y) = (pj 5 , pj 7 ). (Color figures followpage 152.)

©2004 CRC Press LLC

Page 85: Statistics in Musicology

2 4 6 8 10

0.0

0.0

8

HALLE

2 4 6 8 10

0.0

0.1

0

OCKEGHEM

2 4 6 8 100.0

0.1

0

ARCADELT

2 4 6 8 10

0.0

0.1

0

ARCADELT

2 4 6 8 10

0.0

0.1

00.2

0

ARCADELT

2 4 6 8 10

0.0

0.1

0

BYRD

2 4 6 8 10

0.0

0.1

0

BYRD

2 4 6 8 10

0.0

0.1

0

BYRD

2 4 6 8 10

0.0

0.1

0

RAMEAU

2 4 6 8 10

0.0

0.1

0

RAMEAU

2 4 6 8 10

0.0

0.1

0

RAMEAU

2 4 6 8 10

0.0

20

.10

BACH

2 4 6 8 10

0.0

20

.10

BACH

2 4 6 8 10

0.0

0.0

8

BACH

2 4 6 8 10

0.0

0.1

0

SCARLATTI

2 4 6 8 10

0.0

0.0

8

SCARLATTI

2 4 6 8 10

0.0

50.1

5

SCARLATTI

2 4 6 8 10

0.0

0.0

8

HAYDN

2 4 6 8 10

0.0

20

.10

MOZART

2 4 6 8 10

0.0

0.1

0

MOZART

2 4 6 8 10

0.0

50

.15 MOZART

2 4 6 8 10

0.0

20

.10

CLEMENTI

2 4 6 8 10

0.0

50

.15

CLEMENTI

2 4 6 8 10

0.0

0.1

0

SCHUMANN

2 4 6 8 10

0.0

0.0

8

SCHUMANN

2 4 6 8 10

0.0

0.1

0

SCHUMANN

2 4 6 8 10

0.0

20.1

0

CHOPIN

2 4 6 8 10

0.0

20.1

0

CHOPIN

2 4 6 8 10

0.0

40.1

0

CHOPIN

2 4 6 8 10

0.0

0.1

0

WAGNER

2 4 6 8 10

0.0

20

.08

WAGNER

2 4 6 8 10

0.0

0.1

0

DEBUSSY

2 4 6 8 10

0.0

20

.10

DEBUSSY

2 4 6 8 10

0.0

20

.10

DEBUSSY

2 4 6 8 10

0.0

0.1

0

SCRIABIN

2 4 6 8 10

0.0

0.1

0

SCRIABIN

2 4 6 8 10

0.0

0.1

0

SCRIABIN

2 4 6 8 10

0.0

50

.15

BARTOK

2 4 6 8 10

0.0

0.1

0

BARTOK

2 4 6 8 10

0.0

60.1

0

BARTOK

2 4 6 8 10

0.0

40.1

0

PROKOFFIEFF

2 4 6 8 10

0.0

20

.10

PROKOFFIEFF

2 4 6 8 10

0.0

50

.20 PROKOFFIEFF

2 4 6 8 10

0.0

40.1

2

MESSIAEN

2 4 6 8 10

0.0

50

.20

SCHOENBERG

2 4 6 8 10

0.0

20

.08

WEBERN

2 4 6 8 10

0.0

70

.09

TAKEMITSU

2 4 6 8 10

0.0

60.1

0

BERAN

Figure 2.39 Profile plots of p∗j = (p5, p10, p3, p8, p1, p6, p11, p4, p9, p2, p7)

t.

©2004 CRC Press LLC

Page 86: Statistics in Musicology

CHAPTER 3

Global measures of structure andrandomness

3.1 Musical motivation

Essential aspects of music may be summarized under the keywords “struc-ture”, “information” and “communication”. Even aleatoric pieces whereevents are generated randomly (e.g. Cage, Xenakis, Lutoslawsky) havestructure and information induced by the definition of specific random dis-tributions. It is therefore meaningful to measure the amount of structureand information contained in a composition. Clearly, this is a nontrivialtask and many different, and possibly controversial, definitions can be in-vented. In this chapter, two types of measures are discussed: 1) generalglobal measures of information or randomness, and 2) specific local mea-sures indicating metric, melodic, and harmonic structures.

3.2 Basic principles

3.2.1 Measuring information and randomness

There is an enormous amount of literature on information measures andtheir applications. In this section, only some basic fundamental definitionsand results are reviewed. These and other classical results can be found, inparticular, in Fisher (1925, 1956), Hartley (1928), Bhattacharyya (1946a),Erdos (1946), Wiener (1948), Shannon (1948), Shannon and Weaver (1949),Barnard (1951), McMillan (1953), Mandelbrot (1953, 1956), Khinchin (1953,1956), Goldman (1953), Bartlett (1955), Brillouin (1956), Komogorov (1956),Ashby (1956), Joshi (1957), Kullback (1959), Wolfowitz (1957, 1958, 1961),Woodward (1953), Renyi (1959a,b, 1961, 1965, 1970). Also see e.g. Ash(1965) for an overview. A classical measure of information (or random-ness) is entropy, which is also called Shannon information (Shannon 1948,Shannon and Weaver 1949). To explain its meaning, consider the follow-ing question: how much information is contained in a message, or morespecifically, what is the necessary number of digits to encode the messageunambiguously in the binary system? For instance, if the entire vocabularyonly consisted of the words “I”, “hungry”, “not”, “very”, then the wordscould be identified with the binary numbers 00 = “I”, 01 = “hungry”, 10 =

©2004 CRC Press LLC

Page 87: Statistics in Musicology

Figure 3.1 Ludwig Boltzmann (1844-1906). (Courtesy of Osterreichische PostAG.)

“not” and 11 = “very”. Thus, for a vocabulary V of |V | = N = 22 words,n = 2 digits would be sufficient. More generally, suppose that we have aset V with N = 2n elements. Then we need n = log2N digits for encod-ing the elements in the binary system. The number n is then called theinformation of a message from vocabulary V . Note that in the special casewhere V consists of one element only, n = 0, i.e. the information content ofa message is zero, because we know which element of V will be containedin the message even before receiving it.

An extension of this definition to integers N that are not necessarilypowers of 2 can be justified as follows: consider a sequence of k elementsfrom V . The number of sequences v1, ..., vk (vi ∈ V ) is Nk. (Note that oneelement is allowed to occur more than once.) The number of binary digitsto express a sequence v1, ..., vk is nk where 2nk−1 < Nk ≤ 2nk . The averagenumber of digits needed to express an element in this sequence is nk/kwhere k log2N ≤ nk < k log2N + 1. We then have

limk→∞

nkk

= log2N.

The following definition is therefore meaningful:

Definition 25 Let VN be a finite set with N elements. Then the informa-tion necessary to characterize the elements of VN is defined by

I(VN ) = log2N (3.1)

This definition can also be derived by postulating the following propertiesa measure of information should have:

1. Additivity: If |VK | = NM , then I(VK) = I(VN ) + I(VM )

©2004 CRC Press LLC

Page 88: Statistics in Musicology

2. Monotonicity: I(VN ) ≤ I(VN+1)3. Definition of unit: I(V2) = 1.The only function that satisfies these conditions is I(VN ) = log2N.

Consider now a more complex situation where VN = ∪kj=1Vj , Vj ∩Vl = φ(j = l) and |Vj | = Nj (and hence N = N1+...+Nk), and define pj = Nj/N .Suppose that we select an element from V randomly, each element havingthe same probability of being chosen. If an element v ∈ V is known tobelong to a specific Vj , then the additional information needed to identify itwithin Vj is equal to I(Vj) = log2Nj . The expected value of this additionalinformation is therefore

I2 =k∑j=1

pj log2Nj =k∑j=1

pj log2(Npj) (3.2)

Let I1 be the information needed to identify the set Vj which v belongs to.Then the total information needed for identifying (encoding) elements ofV is

log2N = I1 + I2 (3.3)

On the other hand,∑pj log2N = log2N so that we obtain Shannon’s

famous formula

I1 = −k∑j=1

pj log2(pj) (3.4)

I1 is also called Shannon information. Shannon information is thus the ex-pected information about the occurence of the sets V1, ..., Vk contained ina randomly chosen element from V . Note that the term “information” canbe used synonymously for “uncertainty”: the information obtained froma random experiment diminishes uncertainty by the same amount. Thederivation of Shannon information is credited to Shannon (1948) and, in-dependently, Wiener (1948). In physics, an analogous formula is known asentropy and is a measure of the disorder of a system (see Boltzmann 1896,figure 3.1).

Shannon’s formula can also be derived by postulating the following prop-erties for a measure of information of the outcome of a random experiment:let V1, ..., Vk be the possible outcomes of a random experiment and denoteby pj = P (Aj) the corresponding probabilities. Then a measure of infor-mation, say I, obtained by the outcome of the random experiment shouldhave the following properties:1. Function of probabilities: I = I(p1, ..., pk), i.e. I depends on the proba-

bilities pj only;2. Symmetry: I(p1, ..., pk) = I(pπ(1), ..., pπ(k)) for any permutation π;3. Continuity: I(p, 1 − p) is a continuous function of p (0 ≤ p ≤ 1);4. Definition of unit: I(1

2 ,12 ) = 1;

©2004 CRC Press LLC

Page 89: Statistics in Musicology

5. Additivity and weighting by probabilities:

I(p1, ..., pk) = I(p1 + p2, p3, ..., pk) + (p1 + p2)I(p1

p1 + p2,

p2p1 + p2

) (3.5)

The meaning of the first four properties is obvious. The last property canbe interpreted as follows: suppose the outcome of an experiment does notdistinguish between V1 and V2, i.e. if v turns out to be in one of thesetwo sets, we only know that v ∈ V1 ∪ V2. Then the infomation providedby the experiment is I(p1 + p2, p3, ..., pk). If the experiment did distinguishbetween V1 and V2, then it is reasonable to assume that the informationwould be larger by the amount

(p1 + p2)I(p1

p1 + p2,

p2p1 + p2

).

Equation (3.5) tells us exactly that: the complete information I(p1, ..., pk)can be obtained by adding the partial and the additional information. Itturns out that the only function for which the postulates hold is Shannon’sinformation:

Theorem 9 Let I be a functional that assigns each finite discrete distri-bution function P (defined by probabilities p1, ..., pk, k ≥ 1) a real numberI(P ), such that the properties above hold. Then

I(P ) = I(p1, ..., pk) = −k∑j=1

pj log2 pj (3.6)

Shannon information has an obvious upper bound that follows from Jensen’sinequality: recall that Jensen’s inequality states that for a convex functiong and weights wj ≥ 0 with

∑wj = 1 we have

g(∑

wjxj) ≤∑

wjg(xj).

In particular, for g(x) = x log2 x,∑k−1g(pj) = k−1

∑pj log2 pj ≥ g(

∑k−1pj) = −k−1 log2 k.

Hence,I(P ) ≤ log2 k (3.7)

This bound is achieved by the uniform distribution pj = 1/k. The otherextreme case is pj = 1 for some j. This means that event Vj occurs withcertainty and I(p1, ..., pk) = I(pj) = I(1) = I(1, 0) = I(1, 0, 0) etc. Thenfrom the fifth property we have I(1, 0) = I(1) + I(1, 0) so that I(1) = 0.The interpretation is that, if it is clear a priori which event will occur, thena random experiment does not provide any information.

The notion of information can be extended in an obvious way to thecase where one has an infinite but countable number of possible outcomes.

©2004 CRC Press LLC

Page 90: Statistics in Musicology

The information contained in the realization of a random variable X withpossible outcomes x1, x2, ... is defined by

I(X) = −∑

pj log2 pj

where pj = P (X = xj). More subtle is the extension to continuous distri-butions and random variables. A nice illumination of the problem is givenin Renyi (1970): for a random variable with uniform distribution on (0,1),the digits in the binary expansion of X are infinitely many independent0-1-random variables where 0 and 1 occur with probability 1/2 each. Theinformation furnished by a realization ofX would therefore be infinite. Nev-ertheless, a meaningful measure of information can be defined as a limit ofdiscrete approximations:

Th e o re m 1 0 Let X be a random variable with density function f. DefineXN = [NX ]/N where [x] denotes the integer part of x. If I(X1) <∞, thenthe following holds:

limN→∞

I(XN )log2N

= 1 (3.8)

limN→∞

(I(XN ) − log2N) = −∫ ∞

−∞f(x) log2 f(x)dx (3.9)

We thus have

Defini ti on 26 Let X be a random variable with density function f . Then

I(X) = −∫ ∞

−∞f(x) log2 f(x)dx (3.10)

is called the information (or entropy) of X.Note that, in contrast to discrete distributions, information can be negative.This is due to the fact that I(X) is in fact the limit of a difference ofinformations.

The notion of entropy can also be carried over to measuring randomnessin stationary time series in the sense of correlations. (For the definition ofstationarity and time series in general see Chapter 4.)

Definition 27 Let Xt (t ∈ Z) be a stationary process with var(Xt) = 1,and spectral density f . Then the spectral entropy of Xt is defined by

I(Xt, t ∈ Z) = −∫ π−πf(x) log2 f(x)dx (3.11)

This definition is plausible, because for a process with unit variance, f hasthe same properties as a probability distribution and can be interpreted asa distribution on frequencies. The process Xt is uncorrelated if and only iff is constant, i.e. if f is the uniform distribution on [−π, π]. Exactly in thiscase entropy is maximal, and knowledge of past observations does not helpto predict future observations. On the other hand, if f has one or more

©2004 CRC Press LLC

Page 91: Statistics in Musicology

extreme peaks, then entropy is very low (and in the limit minus infinity).This corresponds to the fact that in this case future observations can bepredicted with high accuracy from past values. Thus, future observationsdo not contain as much new information as in the case of independence.

3.2.2 Measuring metric, melodic, and harmonic importance

General idea

Western classical music is usually structured in at least three aspects:melody, metric structure, and harmony. With respect to representing theessential melodic, metric, and harmonic structures, not all notes are equallyimportant. For a given composition K, we may therefore try to find metric,melodic, and harmonic structures and quantify them in a weight functionw : K → R3 (which we will also call an “indicator”). For each note event x ∈K, the three components of w(x) = (wmelodic(x), wmetric(x), wharmonic(x))quantify the ”importance” of x with respect to the melodic, metric, andharmonic structure of the composition respectively.

Omnibus metric, melodic, and harmonic indicators

Specific definitions of structural indicators (or weight functions) are dis-cussed for instance in Mazzola et al. (1995), Fleischer et al. (2000), andBeran and Mazzola (2001). To illustrate the general approach, we give afull definition of metric weights. Melodic and harmonic weights are definedin a similar fashion, taking into account the specific nature of melodic andharmonic structures respectively.

Metric structures characterize local periodic patterns in symbolic onsettimes. This can be formalized as follows: let K ⊂ Z4 be a composition (withcoordinates “Onset Time”, “Pitch”,”Loudness”, and “Duration”), T ⊂ Z

its set of onset times (i.e. the projection of K on the first axis) and lettmax = max{t : t ∈ T }. Without loss of generality the smallest onset timein T is equal to one.

Definition 28 For each triple (t, l, p) ∈ Z × N × N the set

B(t, l, p) = {t+ kp : 0 ≤ k ≤ l}is called a meter with starting point t, length l and period p. The meteris called admissible, if B(t, l, p) ⊂ T . The non-negative length l of a localmeter M = B(t, l, p) is uniquely determined by the set M and is denotedby l(M).

Note that by definition, t ∈ B(t, l, p) for any (t, l, p) ∈ Z × N × N. Theimportance of events at onset time s is now measured by the number ofmeters this onset is contained in. For a given triple (t, l, p), three situationscan occur:

©2004 CRC Press LLC

Page 92: Statistics in Musicology

1. B(t, l, p) is admissible and there is no other admissible local meter B′ =B′(t

′, l

′, p

′) such that B � B′;

2. B(t, l, p) is not admissible;

3. B(t, l, p) is admissible, but there is another admissible local meter B′ =B′(t

′, l

′, p

′) such that B � B′.

We count only case 1. This leads to the following definition:

Definition 29 An admissible meter B(t, l, p) for a composition K ⊂ Z4

is called a maximal local meter if and only if it is not a proper subsetof another admissible local meter B(t′, l′, p′) of K. Denote by M(K) theset of maximal local meters of K and by M(K, t) the set of maximal localmeters of K containing onset t.

Note that the set M(K) is always a covering of T . Metric weights can nowbe defined, for instance, by

Definition 30 Let x ∈ K be a note event at onset time t(x) ∈ T , M =M(K, t) the set of maximal local meters of K containing t(x), and h anondecreasing real function on Z. Specify a minimal length lmin. Thenthe metric indicator (or metric weight) of x, associated with the minimallength lmin, is given by

wmetric(x) =∑M∈M,l(M)≥lmin

h(l(M)) (3.12)

In a similar fashion, melodic indicators wmelodic and harmonic indicatorswharmonic can be derived from a melodic and harmonic analysis respec-tively.

Specific indicators

A possible objection to weight functions as defined above is that only in-formation about pitch and onset time is used. A score, however, usuallycontains much more symbolic information that helps musicians to read itcorrectly. For instance, melodic phrases are often connected by a phras-ing slur, notes are grouped by beams, separate voices are made visible bysuitable orientation of note stems, etc. Ideally, structural indicators shouldtake into account such additional information. An improved indicator thattakes into account knowledge about musical “motifs” can be defined forexample as follows:

Definition 31 Let M = {(τ1, y1), ..., (τk, yk)}, τ1 < τ2 < ... < τk be a“motif” where y denotes pitch and τ onset time. Given a composition K ⊂T × Z ⊂ Z2, define for each score-onset time ti ∈ T (i = 1, ..., n) andu ∈ {1, ..., k}, the shifted motif

M(ti, u) = {(ti + τ1 − τu, y1), ..., (ti + τk − τu, yk)}

©2004 CRC Press LLC

Page 93: Statistics in Musicology

and denote by

Tu(ti) = {ti + τ1 − τu, ..., ti + τk − τu} = {s1, ..., sk}the corresponding onset times. Moreover, let

Xu(ti) = {x = (x(s1), ..., x(sk)) : (si, x(si)) ∈ K}be the set of all pitch-vectors with onset set Tu(ti). Then we define thedistance

du(ti) = minx∈Xu(ti)

k∑i=1

(x(si)− yi)2 (3.13)

If Xu is empty, then du(ti) is not defined or set equal to an arbitrary upperbound D <∞.In this definition, it is assumed that the motif is identified beforehandby other means (e.g. “by hand” using traditional musical analysis). Thedistance du(ti) thus measures in how far there are notes that are similarto those in M, if ti is at the uth place of the rhythmic pattern of motif M.Note that the euclidian distance

∑(x(si) − yi)2 could be replaced by any

other reasonable distance.Analogously, distance or similarity can be measured by correlation:

Definition 32 Using the same definitions as above, let

xo = arg minx∈Xu(ti)

k∑i=1

(x(si)− yi)2,

and define ru(ti) to be the sample correlation between xo and y = (y1, ..., yk).If M(ti, u) � K, then set ru(ti) = 0.Disregarding the position within a motif, we can now define overall motivicindicators (or weights), for instance by

wd,mean(ti) = g(k∑u=1

du(ti)) (3.14)

where g is a monotonically decreasing function,

wd,min(ti) = min1≤u≤k

du(ti) (3.15)

orwcorr(ti) = max

1≤u≤kru(ti) (3.16)

Finally, given weights for p different motifs, we may combine these intoone overall indicator. For instance, an overall melodic indicator based oncorrelations can be defined by

wmelod(ti) =p∑j=1

h(wcorr,j(ti), Li) (3.17)

©2004 CRC Press LLC

Page 94: Statistics in Musicology

where wcorr,j is the weight function for motif number j and Li is the numberof elements in the motif. Including Li has the purpose of attributing higherweights to the presence of longer motifs.

The advantage of the motif-based definition is that one can first searchfor possible motifs in the score, making full use of the available informationin the score as well as musicological and historical knowledge, and thenincorporate these in the definition of melodic weights. Similar definitionsmay be obtained for metric and harmonic indicators.

3.2.3 Measuring dimension

There are many different definitions of dimension, each measuring a specificaspect of “objects”. Best known is the topological dimension. In the usualeuclidian space Rk with scalar product < x, y >=

∑ki=1 xiyi and distances

|x−y| = √< x− y, x− y >, the topological dimension of the space is equal

to k. The dimension of an object in this space is equal to the dimensionof the subspace it is contained in. The euclidian space is, however, ratherspecial since it is metric with a scalar product.

More generally, one can define a topological dimension in any topological(not necessarily metric) space in terms of coverings. We start with thedefinition of a topological space: a topological space is a nonempty setX together with a family O of so-called open subsets of X satisfying thefollowing conditions:1. X ∈ O and φ ∈ O (φ denotes the empty set)2. If U1, U2 ∈ O, then U1 ∪ U2 ∈ O3. If U1, U2 ∈ O, then U1 ∩ U2 ∈ O.A covering of a set S ⊆ X is a collection U ⊆ O of open sets such that

S ⊆ ∪U∈UU.

A refinement of a covering U is a covering U∗ such that for each U∗ ∈ U∗

there exists a U ∈ U with U∗ ⊆ U . The definition of topological dimensionis now as follows:Definition 33 A topological space X has topological dimension m, if everycovering U of X has a refinement U∗ in which every point of X occurs inat most m+ 1 sets of U∗, and m is the smallest such integer.The topological dimension of a subset S ⊆ X is analogous. For instance,a straight line in a euclidian space can be divided into open intervals suchthat at most two intervals intersect – so that dT = 1. Similarily, a simplegeometric figure in the plane, such as a disk or a rectangle (including theinner area), can be covered with arbitrarily small circles or rectangles suchthat at most three such sets intersect – this number can however not bemade smaller. Thus, the topological dimension of such an object is dT =3 − 1 = 2.

©2004 CRC Press LLC

Page 95: Statistics in Musicology

The topological dimension is a relatively rough measure of dimension,since it can assume integer values only and thus classifies sets (in a topo-logical space) into a finite or countable number of categories. On the otherhand, dT is defined for very general spaces where a metric (i.e. distances)need not exist. A finer definition of dimension, which is however confinedto metric spaces, is the Hausdorff-Besicovitch dimension. Suppose we havea set A in a metric space X . In a metric space, we can define open balls ofradius r around each point x ∈ X by

U(r) = {y ∈ X : dX(x, y) < r}where dX is the metric in X . The idea is now to measure the size of A bycovering it with a finite number of balls Ur = {U1(r), ..., Uk(r)} of radius rand to calculate an approximate measure of A by

µUr,r,h(A) =∑

h(r) (3.18)

where the sum is taken over all balls and h is some positive function. Thismeasure depends on r, the specific covering Ur and h. To obtain a measurethat is independent of a specific covering, we define the measure

µr,h(A) = infUρ:ρ<r

µUρ,ρ,h(A) (3.19)

This measure is still only an approximation of A. The question is nowwhether we can get a measure that corresponds exactly to the set A. Thisis done by taking the limit r → 0 :

µh(A) = limr→0

µr,h(A) (3.20)

Clearly, as r tends to zero, µr,h becomes at most larger and therefore has alimit. The limit can be either zero (if µr,h = 0 already), infinity, or a finitenumber. This leads to the following definition:Definition 34 A function h for which

0 < µh(A) <∞is called intrinsic function of A.Consider, for example, a simple shape in the plane such as a circle withradius R. The area of the circle A can be measured by covering it by smallcircles of radius r and evaluating µh(A) using the function h(r) = πr2.It is well known that limr→0 µr,h(A) exists and is equal to µh(A) = πR2.On the other hand, if we took h(r) = πrα with α < 2, then µh(A) = ∞,whereas for α > 2, µh(A) = 0. For standard sets, such as circles, rectangles,triangles, cylinders, etc., it is generally true that the intrinsic function for aset A that with topological dimension dT = d is given by (Hausdorff 1919)

h(r) = hd(r) ={Γ(1

2 )}dΓ(1 + d

2 )rd. (3.21)

©2004 CRC Press LLC

Page 96: Statistics in Musicology

Many other more complicated sets, including randomly generated sets, haveintrinsic functions of the form h(r) = L(r)rd for some d > 0 which isnot always equal to dT , and L a function that is slowly varying at theorigin (see e.g. Hausdorff 1919, Besicovitch 1935, Besicovitch and Ursell1937, Mandelbrot 1977, 1983, Falcomer 1985, 1986, Kono 1986, Telcs 1990,Devaney 1990). Here, L is called slowly varying at zero, if for any u > 0,limr→0[L(ur)/L(r)] = 1. This leads to the following definition of dimension:Defini ti on 35 Let A be a subset of a metric space and

h(r) = L(r) · rd

an intrinsic function of A where L(r) is slowly varying. Then dH = d iscalled the Hausdorff-Besicovitch dimension (or Hausdorff dimension) of A.The definition of Hausdorff dimension leads to the definition of fractals (seee.g. Mandelbrot 1977):Defini ti on 36 Let A be a subset of a metric space. Suppose that A hastopological dimension dT and Hausdorff dimension dH such that

dH > dT .

Then A is called a fractal.

Figure 3.2 Fractal pictures (by Celine Beran, computer generated.) (Color figuresfollow page 152.)

Intuitively, dH > dT means that the set A is “more complicated” than astandard set with topological dimension dT . An alternative definition ofHausdorff-dimension is the fractal dimension:Defini ti on 37 Let A be a compact subset of a metric space. For each ε > 0,denote by N(ε) the smallest number of balls of radius r ≤ ε necessary tocover A. If

dF = − limε→0

logN(ε)log ε

(3.22)

exists, then dF is called the fractal dimension of A.It can be shown that dF ≥ dT . Moreover, in Rk one has dF ≤ k = dT . Beau-tiful examples of fractal curves and surfaces (cf. Figure 3.2) can be found in

©2004 CRC Press LLC

Page 97: Statistics in Musicology

Mandelbrot (1977) and other related books. Many phenomena, not only innature but also in art, appear to be fractal. For instance, fractal shapes canbe found in Jackson Pollock’s (1912-1956) abstract drip paintings (Taylor1999a,b,c, 2000). In music, the idea of fractals was used by some contem-porary composers, though mainly as a conceptual inspiration rather thanan exact algorithm (e.g. Harri Vuori, Gyorgy Ligeti; Figure 3.3).

Figure 3.3 Gyorgy Ligeti (*1923). (Courtesy of Philippe Gontier, Paris.)

The notion of fractals is closely related to self-similarity (see Mandel-brot 1977 and references therein). Self-similar geometric objects have theproperty that the same shapes are repeated at infinitely many scales. Bydrawing recursively m smaller copies of the same shape – rescaling themby a factor s – one can construct fractals. For self-similar objects, the frac-tal dimension can be calculated directly from the scaling factor s and thenumber m of repetitions of the rescaled objects by

dF =logmlog s

(3.23)

For many purposes more realistic are random fractals where instead ofthe shape itself, the distribution remains the same after rescaling. Morespecifically, we haveDefinition 38 Let Xt (t ∈ R) be a stochastic process. The process is calledself-similar with self-similarity parameter H, if for any c > 0

Xt =d c−HXct

where = d means equality of the two processes in distribution.The parameter H is also called Hurst exponent. Self-similar processes are(like their deterministic counterparts) very special models. However, theyplay a central role for stochastic processes just like the normal distributionfor random variables. The reason is that, under very general conditions,the limit of partial sum processes (see Lamperti 1962, 1972) is always aself-similar process:

©2004 CRC Press LLC

Page 98: Statistics in Musicology

Th e o re m 1 1 Suppose that Zt (t ∈ R+) is a stochastic process such thatZ1 = 0 with positive probability and Zt is the limit in distribution of thesequence of normalized partial sums

a−1n Snt = a

−1n

[nt]∑s=1

Xs (n = 1, 2, ...) (3.24)

where X1, X2, ... is a stationary discrete time process with zero mean anda1, a2, ... a sequence of positive normalizing constants such that log an →∞. Then there exists an H > 0 such that for any u > 0, limn→∞(anu/an) =uH , Zt is self-similar with self-similarity parameter H, and Zt has station-ary increments.

The self-similarity parameter therefore also makes sense for processes thatare not exactly self-similar themselves, since it is defined by the rate n−H

needed to standardize partial sums. Moreover, H is related to the frac-tal dimension, the exact relationship between H and the fractal dimen-sion however depends on some other properties of the process as well. Forinstance, sample paths of (univariate) Gaussian self-similar processes so-called fractional Brownian motion (see Chapter 4) have, with probabilityone, a fractal dimension of 2 − H with possible values of H in the inter-val (0, 1). Thus, the closer H is to 1, the more a sample paths is similarto a simple geometric line with dimension one. On the other hand, as Happroaches zero, a typical sample path fills up most of the plane so thatthe dimension approaches two. Practically, H can be determined from anobserved series X1, ..., Xn, for example by maximum likelihood estimation.

For a thorough discussion of self-similar and related processes and sta-tistical methods see e.g. Beran (1994). Further references on fractals apartfrom those given above are, for instance, Edgar (1990), Falconer (1990),Peitgen and Saupe (1988), Stoyan and Stoyan (1994), and Tricot (1995).A cautionary remark should be made at this point: in view of theorem 11,the fact that we do find self-similarity in aggregated time series is hardlysurprising and can therefore not be interpreted as something very specialthat would distinguish the particular series from other data. What maybe special at most is which particular value of H is obtained and whichparticular self-similar process the normalized aggregated series convergesto.

3.3 Specific applications in music

3.3.1 Entropy of melodic shapes

Let x(ti) be the upper and y(ti) the lower envelope of a composition atscore-onset times ti (i = 1, ..., n). To investigate the shape of the melodic

©2004 CRC Press LLC

Page 99: Statistics in Musicology

movement we consider the first and second discrete “derivatives”

x(1)(ti) =∆x(ti)∆ti

=x(ti+1) − x(ti)ti+1 − ti (3.25)

and

x(2)(ti) =∆2x(ti)∆2ti

=[x(ti+2)− x(ti+1)] − [x(ti+1) − x(ti)]

[ti+2 − ti+1] − [ti+1 − ti] (3.26)

Alternatively, if octaves “do not count”, we define

x(1;12)(ti) =[x(ti+1) − x(ti)]12

ti+1 − ti (3.27)

and

x(2;12)(ti) =[x(ti+2) − x(ti+1)]12 − [x(ti+1) − x(ti)]12

[ti+2 − ti+1] − [ti+1 − ti+2](3.28)

where [x]k = xmod k. Thus, in this definition intervals between successivenotes x(ti), x(ti+1) and x(tj), x(tj+1) respectively are considered identicalif they differ by octaves only.

The number of possible values of x(2) and x(2;12) is finite, however poten-tially very large. In first approximation we may therefore consider both vari-ables to be continuous. In the following, the distribution of x(2) and x(2;12)

is approximated by a continuous density kernel estimate f (see Chapter 2).For illustration, we define the following measures of entropy:

1.

E1 = −∫f(x) log2 f(x)dx (3.29)

where f is obtained from the observed data x(2;12)(t1), ..., x(2;12)(tn) bykernel estimation.

2. E2 : Same as E1, but using x(2)(t1), ..., x(2)(tn) instead.

3.

E3 = −∫f(x, y) log2 f(x, y)dxdy (3.30)

where f(x, y) is a kernel estimate based on observations (ai, bi) withai = x(2)(ti−1) and bi = x(2)(ti). Thus, E3 is the (empirical) entropy ofthe joint distribution of two successive values of x(2).

4. E4 : Same as Entropy 3, but using (x(2;12)(ti−1), x(2;12)(ti)) instead.

5. E5 : Same as Entropy 3, but using (x(ti) − y(ti))(1) instead.

6. E6 : Same as Entropy 3, but using (x(ti) − y(ti))(1;12) instead.

7. E7 : Same as Entropy 1, but using (x(ti) − y(ti))(1) instead.

8. E8 : Same as Entropy 1, but using (x(ti) − y(ti))(1;12) instead.

©2004 CRC Press LLC

Page 100: Statistics in Musicology

Figure 3.4 Comparison of entropies 1, 2, 3, and 4 for J.S. Bach’s Cello SuiteNo. I and R. Schumann’s op. 15, No. 2, 3, 4, and 7, and op. 68, No. 2 and 16.

©2004 CRC Press LLC

Page 101: Statistics in Musicology

Each of these entropies characterizes information content (or randomness)of certain aspects of melodic patterns in the upper and lower envelope.Figures 3.4a through d show boxplots of Entropies 1 through 4 for Bachand Schumann (Figure 3.8). The pieces considered here are: J.S. Bach CelloSuite No. I (each of the six movements separately), Praludium und FugeNo. 1 and 8 from “Das Wohltemperierte Klavier” I (each piece separately);R. Schumann – op. 15, No. 2, 3, 4 and 7, and op. 68, No. 2 and 16. Obvi-ously there is a difference between Bach and Schumann in all four entropymeasures. In Bach’s pieces, entropy is higher, indicating a more uniformmixture of local melodic shapes.

3.3.2 Spectral entropy of local interval variability

Consider the local variability of intervals yi = x(ti+1)− x(ti) between suc-cessive notes. Specifically, we consider a moving “nearest neighbor” window[ti, ti+4] (i = 1, ..., n− 4) and define local variances

vi =1

4 − 1

3∑j=0

(yi+j − yi)2 (3.31)

where yi = 4−1∑3j=0 yi+j . Based on this, a SEMIFAR-model is fitted to

the time series zi = log(vi + 12 ) (see Chapter 4 for the definition of SEMI-

FAR models). The fitted spectral density f(λ; θ) is then used to define thespectral entropy

E9 = −∫ π−πf(λ; θ) log f(λ; θ)dλ (3.32)

If octaves do not count, then intervals are circular so that an estimate ofvariability for circular data should be used. Here, we use R∗ = 2(1−R) asdefined in Chapter 7. To transform the range [0, 2] of R∗ to the real line,the logistic transformation is applied, defining

zi = log(R∗ + ε

2 + ε−R∗ )

where ε is a small positive number that is needed in order that −∞ < zi <∞ even if R∗ = 0 or 2 respectively. Fitting a SEMIFAR-model to zi wethen define E10 the same way as E9 above.

Figure 3.6 shows a comparison of E9 and E10 for the same compositionsas in 3.3.1. In contrast to the previous measures of entropy, Bach is con-sistently lower than Schumann. With respect to E10 this is also the casein comparison with Scriabin (Figure 3.5) and Martin. Thus, for Bach thereappears to be a high degree of nonrandomness (i.e. organization) in theway variability of interval steps changes sequentially.

©2004 CRC Press LLC

Page 102: Statistics in Musicology

Figure 3.5 Alexander Scriabin (1871-1915) (at the piano) and the conductorSerge Koussevitzky. (Painting by Robert Sterl, 1910; courtesy of GemaldegalerieNeuer Meister, Dresden, and Robert-Sterl-House.)

Figure 3.6 Comparison of entropies 9 and 10 for Bach, Schumann, and Scri-abin/Martin.

©2004 CRC Press LLC

Page 103: Statistics in Musicology

3.3.3 Omnibus metric, melodic, and harmonic indicators for compositionsby Bach, Schumann, and Webern

Figures 3.7, and 3.9 through 3.11 show the “omnibus” metric, melodic,and harmonic weight functions for Bach’s Canon cancricans, Schumann’sop. 15/2 and 7, and for Webern’s Variations op 27. For Bach’s composi-tion, the almost perfect symmetry around the middle of the compositioncan be seen. Moreover, the metric curve exhibits a very regular up anddown. Schumann’s curves, in particular the melodic one, show clear pe-riodicities. This appears to be quite typical for Schumann and becomeseven clearer when plotting a kernel-smoothed version of the curves (herea bandwidth of 8/8 was used). Interestingly, this type of pattern can alsobe observed for Webern. In view of the historic development of 12-tonemusic as a logical continuation of harmonic freedom and romantic ges-ture achieved in the 19th and early 20th centuries, this similarity is notcompletely unexpected. Finally, note that a relationship between metric,

Figure 3.7 Metric, melodic, and harmonic global indicators for Bach’s Canoncancricans.

melodic and harmonic structure can not be seen directly from the “raw”curves. However, smoothed weights as shown in the figures above revealclear connections between the three weight functions. This is even the casefor Webern, in spite of the absence of tonality.

©2004 CRC Press LLC

Page 104: Statistics in Musicology

Figure 3.8 Robert Schumann (1810-1856). (Courtesy of ZentralbibliothekZurich.)

3.3.4 Specific melodic indicators for Schumann’s Traumerei

Schumann’s Traumerei is rich in local motifs. Here, we consider eight ofthese as indicated in Figure 3.12. Figure 3.13 displays the individual indi-cator functions obtained from (3.16). The overall indicator function m(t) =wmelod(t) displayed in Figure 3.15 is defined by (3.17) with h(w,L) =[2 · max(w, 0.5)]L and Lj =number of notes in motif j. The contributionsh(wcorr,j(ti), Lj) of wcorr,j (j = 1, ..., 8) are given in Figure 3.14.

©2004 CRC Press LLC

Page 105: Statistics in Musicology

Figure 3.9 Metric, melodic, and harmonic global indicators for Schumann’s op.15, No. 2 (upper figure), together with smoothed versions (lower figure).

©2004 CRC Press LLC

Page 106: Statistics in Musicology

Figure 3.10 Metric, melodic, and harmonic global indicators for Schumann’s op.15, No. 7 upper figure), together with smoothed versions (lower figure).

©2004 CRC Press LLC

Page 107: Statistics in Musicology

Figure 3.11 Metric, melodic, and harmonic global indicators for Webern’s Varia-tions op. 27, No. 2 (upper figure), together with smoothed versions (lower figure).

©2004 CRC Press LLC

Page 108: Statistics in Musicology

Figure 3.12 R. Schumann – Traumerei: motifs used for specific melodic indica-tors.

©2004 CRC Press LLC

Page 109: Statistics in Musicology

Figure 3.13 R. Schumann – Traumerei: indicators of individual motifs.

Figure 3.14 R. Schumann – Traumerei: contributions of individual motifs to over-all melodic indicator.

©2004 CRC Press LLC

Page 110: Statistics in Musicology

onset time

w

0 5 10 15 20 25 30

050

100

150

Figure 3.15 R. Schumann – Traumerei: overall melodic indicator.

©2004 CRC Press LLC

Page 111: Statistics in Musicology

CHAPTER 4

Time series analysis

4.1 Musical motivation

Musical events are ordered according to a specific temporal sequence. Timeseries analysis deals with observations that are indexed by an ordered vari-able (usuallly time). It is therefore not surprising that time series anal-ysis is important for analyzing musical data. Traditional applications areconcerned with “raw physical data” in the form of audio signals (e.g. digi-tal CD-recording, sound analysis, frequency recognition, synthetic sounds,modeling musical instruments). In the last few years, time series modelshave been developed for modeling symbolic musical data and analyzing“higher level” structures in musical performance and composition. A fewexamples are discussed in this chapter.

4.2 Basic principles

4.2.1 Deterministic and random components, basic definitions

Time series analysis in its most sophisticated form is a complex subjectthat cannot be summarized in one short chapter. Here, we briefly mentionsome of the main ingredients only. For a thorough systematic account of thetopic we refer the reader to standard text books such as Priestley (1981a,b),Brillinger (1981), Brockwell and Davis (1991), Diggle (1990), Beran (1994),Shumway and Stoffer (2000).A time series is a family of (usually, but not necessarily) real variables

Xt with an ordered index t. For simplicity, we assume that observationsare taken at equidistant discrete time points t ∈ Z (or N). Usually, obser-vations are random with certain deterministic components. For instance,we may have an additive decomposition Xt = µ(t) + Ut where Ut is suchthat E(Ut) = 0 and µ(t) is a deterministic function of t. One of the mainaims of time series analysis is to identify the probability model that gen-erated an observed time series x1, ..., xn. In the additive model this wouldmean to estimate the mean function µ(t) and the probability distributionof the random sequence U1, U2, .... Note that a random sequence can alsobe understood as a function mapping positive integers t to the real numbersUt.The main difficulties in identifying the correct distribution are:

©2004 CRC Press LLC

Page 112: Statistics in Musicology

1. The probability law has to be defined on an infinite dimensional space ofvectors (X1, X2, ...). This difficulty is even more serious for continuoustime series where a sample path is a function on R;

2. The finite sample vector X(n) = (X1, ..., Xn)t has an arbitrary n-di-mensional distribution so that it cannot be estimated from observed val-ues x1, ..., xn consistently, unless some minimal assumptions are made.

Difficulty 1 can be solved by applying appropriate mathematical techniquesand is described in detail in standard books on stochastic processes andtime series analysis (see e.g. Billingsley 1986 and the references above). Dif-ficulty 2 cannot be solved by mathematical arguments only. It is of coursepossible to give necessary or sufficient conditions such that the probabilitydistribution can be estimated with arbitrary accuracy (measured in an ap-propriate sense) as n tend infinity. However, which concrete assumptionsshould be used depends on the specific application. Assumptions should nei-ther be too general (otherwise population quantities cannot be estimated)nor too restrictive (otherwise results are unrealistic).A standard, and almost necessary, assumption is that Xt can be reduced

to a stationary process Ut by applying a suitable transformation. For in-stance, we may have a deterministic “trend” µ(i) plus stationary “noise”Ui,

Xi = µ(i) + Ui, (4.1)or an integrated process of order m for which the mth difference is station-ary, i.e.

(1−B)mXi = Ui (4.2)where (1−B)Xi = Xi −Xi−1. In the latter case, Xt is called m-differencestationary. Stationarity is defined as follows:Definition 39 A time series Xi is called strictly stationary, if for anyk, i1, ..., in ∈ N,

P (Xi1 ≤ x1, ..., Xin ≤ xn) = P (Xi1+k ≤ x1, ..., Xin+k ≤ xn) (4.3)

The time series is called weakly (or second order) stationary, if

µ(i) = E(Xi) = µ = const (4.4)

and for any i, j ∈ N, the autocovariance depends on the lag k = |i− j| only,i.e.

cov(Xi, Xi+k) = γ(k) = γ(−k) (4.5)A second order stationary process can be decomposed into uncorrelatedrandom components that correspond to periodic signals, via the so-calledspectral representation

Xt = µ+∫ π

−πeitλdZX(λ). (4.6)

Here ZX(λ) = ZX,1(λ) + iZX,2(λ) ∈ C is a so-called orthogonal increment

©2004 CRC Press LLC

Page 113: Statistics in Musicology

process (in λ) with the following properties: ZX(0) = 0, E[ZX(λ)] = 0 andfor λ1 > λ2 ≥ ν1 > ν2,

E[∆ZX(λ2, λ1)∆ZX(ν2, ν1)] = 0 (4.7)

where ∆ZX(u, v) = ZX(u) − ZX(v). The integral in (4.6) is defined as alimit in mean square. It can be constructed by approximating the functioneitλ by step functions

gn(λ) =∑

αi,n1{ai,n < λ ≤ bi,n}(n ∈ N). For step functions we have the integrals

In =∫ π

−πgn(λ)dZX(λ) =

∑αi,n[Z(bi,n)− Z(ai,n)].

As gn → eitλ, the integrals In converge to a random variable I, in the sensethat

limn→∞E[(I − In)2] = 0.

The random variable I is then denoted by∫exp(itλ)dZ(λ). The spectral

representation is especially useful when one needs to identify (random)periodicities. For this purpose one defines the spectral distribution function

FX(λ) = E[|ZX(λ) − ZX(0)|2] = E[|ZX(λ)|2] (4.8)

The variance is then decomposed into frequency contributions by

var(Xt) =∫ π

−πE[|dZX(λ)|2] =

∫ π

−πdFX(λ) (4.9)

This means that the expected contribution (expected squared amplitude)of components with frequencies in the interval (λ, λ+ ε] to the variance ofXt is equal to F (λ + ε)− F (λ).Two interesting special cases can be distinguished:

Case 1 – F differentiable: In this case,

F (λ+ ε)− F (λ) =d

dλF (λ)ε+ o(ε) = f(λ)ε+ o(ε).

The function f is called spectral density and can also be defined directlyby

f(λ) =12π

∞∑k=−∞

γX(k)eikλ (4.10)

where γX(k) = cov(Xt, Xt+k). The inverse relationship is

γX(k) =∫ π

−πeikλf(λ)dλ (4.11)

A high peak of f at a frequency λo means that the component(s) at (or inthe neighborhood of) λo contribute largely to the variability of Xt. Note

©2004 CRC Press LLC

Page 114: Statistics in Musicology

that the period of exp(itλ), as a function of t, is T = 2π/λ (sometimesone therefore defines λ = λ/(2π) as frequency in order that the period T isdirectly the inverse of the frequency). Thus, a peak of f at λo implies thata sample path of Xt is likely to exhibit a strong periodic component withfrequency λo. Periodicity is, however, random – the observed series is not aperiodic function. The meaning of random periodicity can be explained bestin the simplest case where T is an integer: if f has a peak at frequency λo =2π/T, then the correlation between Xt and Xt+jT (j ∈ Z) is relatively highcompared to other correlations with similar lags. A further complicationthat blurs periodicity is that, if f is continuous around a peak at λo, thenthe observed signal is a weighted sum of infinitely (in fact uncountably)many, relatively large components with frequencies that are similar to λo.The sharper the peak, the less this “blurring” takes place and a distinctperiodicity (though still random) can be seen. In the other extreme casewhere f is constant, there is no preference for any frequency, and γX(k) = 0(k = 0), i.e. observations are uncorrelated.Case 2 - F is a step function with a finite or countable number of jumps:this corresponds to processes of the form

Xt =k∑j=1

Ajeiλjt

for some k ≤ ∞, and λj ∈ [0, π], Aj ∈ C. We then have

F (λ) =∑

j:λj≤λE[|Aj |2], (4.12)

var(Xt) =k∑j=1

E[|Aj |2] (4.13)

This means that the variance is a sum of contributions that are due to thefrequencies λj (1 ≤ j ≤ k). A sample path of Xt cannot be distinguishedfrom a deterministic periodic function, because the randomly selected am-plitudes Aj are then fixed.Finally, it should be noted that not all frequencies are observable when

observations are taken at discrete time points t = 1, 2, ..., n. The smallestidentifiable period is 2, which corresponds to a highest observable frequencyof 2π/2 = π. The largest identifiable period is n/2, which corresponds tothe smallest frequency 4π/n. As n increases, the lowest frequency tends tozero, however the highest does not. In other words, the highest frequencyresolution does not improve with increasing sample size.To obtain more general models, one may wish to relax the condition

of stationarity. An asymptotic concept of local stationarity is defined inDahlhaus (Dahlhaus 1996a,b, 1997): a sequence of stochastic processesXt,n

©2004 CRC Press LLC

Page 115: Statistics in Musicology

(n ∈ N) is called locally stationary, if we have a spectral representation

Xt,n = µ(t

n) +

∫ π

−πeitλAt,n(λ)dZX(λ), (4.14)

with “ = ” meaning almost sure (a.s.) equality, µ(u) continuous, and thereexists a 2π−periodic function A : [0, 1] × R → C such that A(u,−λ) =A(u, λ), A(u, λ) is continuous in u, and

supt,λ

|A( tn, λ)−At,n(λ)| ≤ cn−1 (4.15)

(a.s.) for some constant c < ∞. Intuitively, this means that for n largeenough, the observed process can be approximated locally in a small timewindow t± ε by the stationary process

∫exp(itλ)A( tn , λ)dZX(λ). The or-

der n−1 of the approximation is chosen such that most standard estima-tion procedures, such as maximum likelihood estimation, can be appliedlocally and their usual properties (e.g. consistency, asymptotic normality)still hold. Under smoothness conditions on A one can prove that a mean-ingful “evolving” spectral density fX(u, λ) (u ∈ (0, 1)) exists such that

fX(u, λ) = limn→∞

12π

∞∑k=−∞

cov(X[u·n−k/2],n, X[u·n+k/2],n) (4.16)

The function fX(u, λ) is called evolutionary spectral density. Note that, forfixed u,

limn→∞ cov(X[u·n−k/2],n, X[u·n+k/2],n) = γX(k)

= (2π)−1

∫exp(ikλ)fX(u, λ)dλ.

Thumfart (1995) carries this concept over to series with discrete spectra.A simplified definition can be given as follows: a sequence of stochasticprocesses Xt,n (n ∈ N) is said to have a discrete evolutionary spectrumFX(u, λ), if

Xt,n = µ(t

n) +

∑j∈M

Aj(t

n)eiλj( t

n )t (4.17)

where M ⊆ Z, and µj(u) is twice continuously differentiable. The discreteevolutionary spectrum can be defined in analogy to the continuous case.For other definitions of nonstationary processes see e.g. Priestley (1965,1981), Ghosh et al. (1997) and Ghosh and Draghicescu (2002a,b).

4.2.2 Sampling of continuous-time time series

Often time series observed at discrete time points t = j ·∆τ (j = 1, 2, 3, ...)actually “happen” in continuous time τ ∈ R. Sampling in discrete time

©2004 CRC Press LLC

Page 116: Statistics in Musicology

leads to information loss in the following way: let Yτ be a second order sta-tionary time series with τ ∈ R. (Stationarity in continuous time is definedin an exact analogy to definition 39.) Then, Yτ has a spectral representation

Yτ =∫ ∞

−∞eiτλdZY (λ), (4.18)

a spectral distribution function

FY (λ) =∫ λ

−∞E[|dZ(λ)|2] (4.19)

and, if F′exists, a spectral density function

fY (λ) = F ′(λ) =12π

∫ ∞

−∞e−iτλγY (τ)dτ (4.20)

We also have

γY (τ) = cov(Yt, Yt+τ ) =∫

eiλτf(λ)dλ.

The reason why the frequency range extends to (−∞,∞), instead of [−π, π],is that in continuous time, by definition, arbitrarily small frequencies areobservable.Suppose now that Yτ is observed at discrete time points t = j ·∆τ , i.e.

we observeXt = Yj·∆τ (4.21)

Then we can write

Xt =∫ ∞

−∞eij(∆τλ)dZY (λ) =

∞∑u=−∞

∫ π/∆τ+(2π/∆τ)u

−π/∆τ+(2π/∆τ)u

eij(∆τλ)dZY (λ)

(4.22)

=∞∑

u=−∞

∫ π/∆τ

−π/∆τeij(∆τλ)dZY (λ+ (2π/∆τ)u) =

∫ π/∆τ

−π/∆τeitλdZX(λ)

(4.23)

where

dZX(λ) =∞∑

u=−∞dZY (λ+ (2π/∆τ)u) (4.24)

Moreover, if Yτ has spectral density fY , then the spectral density of Xt is

fX(λ) =∞∑

u=−∞fY (λ+ (2π/∆τ)u) (4.25)

for λ ∈ [− π∆τ ,

π∆τ ].This result can be interpreted as follows: a frequency λ >

π/∆τ can be written as λ = λo − (2π/∆τ)j for some j ∈ N where λo is inthe interval [−π/∆τ, π/∆τ ]. The contributions of the two frequencies λ and

©2004 CRC Press LLC

Page 117: Statistics in Musicology

λo to the observed function Xt (in discrete time) are confounded, i.e. theycannot be distinguished. Thus, if we observe a peak of fX at a frequencyλ ∈ (0, π/∆τ ], then this may be due to any of the periodic components withperiods 2π/(λ+ (2π/∆τ)u), u = 0, 1, 2, ..., or a combination of these. Thishas, for instance, direct implications for sampling of sound signals. Supposethat 22050Hz (i.e. λ = 22050 ·2π ≈ 138544.2) is the highest frequency thatwe want to identify (and later reproduce) correctly, instead of attributing itto a lower frequency. This would cover the range perceivable by the humanear. Then ∆τ must be so small that π/∆τ ≥ 22050 ·2π. Thus the time gap∆τ between successive measurements of the sound wave must not exceed1/44100.

4.2.3 Linear filters

Suppose we need to extract or eliminate frequency components from asignalXt with spectral density fX . The aim is thus, for instance, to producean output signal Yt whose spectral density fY is zero for a frequency intervala ≤ λ ≤ b. The simplest, though not necessarily best, way to do this is linearfiltering. A linear filter maps an input series Xt to an output series Yt by

Yt =∞∑

j=−∞ajXt−j (4.26)

The coefficients must fulfill certain conditions in order that the sum isdefined. If Xt is second order stationary, then we need

∑a2j < ∞. The

resulting spectral density of Yt is

fY (λ) = |A(λ)|2fX(λ) (4.27)

where

A(λ) =∞∑

j=−∞aje

−ijλ. (4.28)

To eliminate a certain frequency band [a, b] one thus needs a linear filtersuch that A(λ) ≡ 0 in this interval.Equation (4.27) also helps to construct and simulate time series mod-

els with desired spectral densities: a series with spectral density fY (λ) =(2π)−1|A(λ)|2 can be simulated by passing a series of independent obser-vations Xt through the filter A(λ). Note that, in reality, one can use onlya finite number of terms in the filter so that only an approximation can beachieved.

4.2.4 Special models

When modeling time series statistically, one may use one of the followingapproaches: a) parametric modeling; b) nonparametric modeling; and c)

©2004 CRC Press LLC

Page 118: Statistics in Musicology

semiparametric modeling. In parametric modeling, the probability distri-bution of the time series is completely specified a priori, except for a fi-nite dimensional parameter θ = (θ1, ..., θp)t. In contrast, for nonparametricmodels, an infinite dimensional parameter is unknown and must be esti-mated from the data. Finally, semiparametric models have parametric andnonparametric components. A link between parametric and nonparametricmodels can also be established by data-based choice of the length p of theunknown parameter vector θ, with p tending to infinity with the samplesize. Some typical parametric models are:1. White noise: Xt second order stationary, var(Xt) = σ2,

fX(λ) = σ2/(2π),

and γX(k) = 0 (k = 0)2. Moving average process of order q, MA(q):

Xt = µ+ εt +q∑k=1

ψkεt−k (4.29)

with µ ∈ R, εt independent identically distributed (iid) r.v., E(εt) = 0 andσ2ε = var(εt) < ∞. This can also be written as

Xt − µ = ψ(B)εt (4.30)

where B is the backshift operator with BXt = Xt−1, ψ(B) =∑qk=0 ψkB

k.If∑qk=0 ψkz

k = 0 implies |z| > 1, then Xt is invertible in the sense that itcan also be written as

Xt − µ =∞∑k=1

ϕk(Xt−k − µ) + εt.

3. Autoregressive process of order p, AR(p):

(Xt − µ)−p∑k=1

ϕk(Xt−k − µ) = εt (4.31)

or ϕ(B)(Xt−µ) = εt where ϕ(B) = 1−∑pk=1 ϕkB

k. If 1−∑pk=1 ϕkz

k = 0implies |z| > 1, then Xt is stationary.4. Autoregressive moving average process, ARMA(p, q):

ϕ(B)(Xt − µ) = ψ(B)εt. (4.32)

The spectral density is

fX(λ) = σ2ε

|ψ(eiλ)|2|ϕ(eiλ)|2 . (4.33)

5. Linear process:

Xt = µ+∞∑

j=−∞ψjεt−j (4.34)

©2004 CRC Press LLC

Page 119: Statistics in Musicology

where ψj depend on a finite dimensional parameter vector θ. The spectraldensity is

fX(λ) = σ2ε |ψ(eiλ)|2.

6. Integrated ARIMA process, ARIMA(p, d, q) (Box and Jenkins 1970):

ϕ(B)((1 −B)dXt − µ) = ψ(B)εt (4.35)

with d = 0, 1, 2, ..., where ϕ(z) and ψ(z) are not zero for |z| ≤ 1. Thismeans that the dth difference (1−B)dXt is a stationary ARMA process.7. Fractional ARIMA process, FARIMA(p, d, q) (Granger and Joyeux 1980,Hosking 1981, Beran 1995):

(1 −B)δϕ(B){(1 −B)mXt − µ} = ψ(B)εt (4.36)

with d = m+ δ, 12 < δ < 1

2 , m = 0, 1. Here,

(1−B)d =∞∑k=0

(−1)kBk

(d

k

)

with (d

k

)=

Γ(d+ 1)Γ(k + 1)Γ(d− k + 1)

.

The spectral density of (1 −B)mXt is

fX(λ) = σ2ε

|ψ(eiλ)|2|ϕ(eiλ)|2 |1− eiλ|−2d. (4.37)

The fractional differencing parameter δ plays an important role. If δ =0, then (1 − B)mXt is an ordinary ARIMA(p, 0, q) process, with spectraldensity such that fX(λ) converges to a finite value fX(0) as λ → 0 andthe covariances decay exponentially, i.e. |γX(k)| ≤ Cak for some 0 < C <∞, 0 < a < 1. The process is therefore said to have short memory. Forδ > 0, fX has a pole at the origin of the form fX(λ) ∝ λ−2δ as λ → 0, andγX(k) ∝ k2d−1 so that

∞∑k=−∞

γX(k) = ∞.

This case is also known as long memory, since autocorrelations decay veryslowly (see Beran 1994). On the other hand, if δ < 0, then fX(λ) ∝ λ−2δ

converges to zero at the origin and∞∑

k=−∞γX(k) = 0.

This is called antipersistence, since for large lags there is a negative corre-lation. The fractional differencing parameter δ, or d = δ+m, is also calledlong-memory parameter, and is related to the fractal or Hausdorff dimen-sion dH (see Chapter 3). For an extended discussion of long-memory andantipersistent processes see e.g. Beran (1994) and references therein.

©2004 CRC Press LLC

Page 120: Statistics in Musicology

8. Fractional Gaussian noise (Mandelbrot and van Ness 1968, Mandelbrotand Wallis 1969): recall that a stochastic process Yt (t ∈ R) is called self-similar with self-similarity parameter H , if for any c > 0, Yt =d c−HYct.This definition implies that the covariances of Yt are equal to

cov(Yt, Yt+s) =σ2

2(|t|2H + |s|2H − |t− s|2H)

where σ2 > 0. If Yt is Gaussian (i.e. all joint distributions are normal),then the process is fully determined by its expected value and the covari-ance function. Therefore, there is only one self-similar Gaussian process.This process is called fractional Brownian motion BH(t) with self-similarityparameter 0 < H < 1. The discrete time increment process

Xt = BH(t)−BH(t− 1) (t ∈ N) (4.38)

is called fractional Gaussian noise (FGN). FGN is stationary with autoco-variances

γ(k) =σ2

2(|k + 1|2H + |k − 1|2H − 2|k|2H), (4.39)

the spectral density is equal to (Sinai 1976)

f(λ) = 2cf(1− cosλ)∞∑

j=−∞|2πj + λ|−2H−1, λ ∈ [−π, π] (4.40)

with cf = cf (H,σ2) = σ2(2π)−1 sin(πH)Γ(2H + 1) and σ2 = var(Xi). Forfurther discussion see e.g. Beran (1994).8. Polynomial trend model:

Xt =p∑j=0

βjtj + Ut (4.41)

where Ut is stationary.9. Harmonic or seasonal trend model:

Xt =p∑j=0

αj cosλjt+p∑j=0

αj sinλjt+ Ut (4.42)

with Ut stationary10. Nonparametic trend model:

Xt,n = g(t

n) + Ut (4.43)

with g : [0, 1] → R a “smooth” function (e.g. twice continuously differen-tiable) and Ut stationary.11. Semiparametric fractional autoregressive model, SEMIFAR(p, d, q) (Be-ran 1998, Beran and Ocker 1999, 2001, Beran and Feng 2002a,b):

(1−B)δϕ(B){(1 −B)mXt − g(st)} = Ut (4.44)

©2004 CRC Press LLC

Page 121: Statistics in Musicology

where d, ϕ, εt and g are as above and m = 0, 1. In this case, the centereddifferenced process Yt = (1−B)mXt − g(st) is a fractional ARIMA(p, δ, 0)model. The SEMIFARmodel incorporates stationarity, difference stationar-ity, antipersistence, short memory and long memory, as well as an unspec-ified trend. Incorporating all these components enables us to distinguishstatistically which of the components are present in an observed time se-ries (see Beran and Feng 2002a,b). A software implementation by Beran isincluded in the S−Plus−package FinMetrics and described in Zivot andWang (2002).

4.2.5 Fitting parametric models

If Xt is a second order stationary model with a distribution function thatis known except for a finite dimensional parameter θo = (θo1 , ..., θ

ok)t ∈ Θ ⊆

Rk, then the standard estimation technique is the maximum likelihoodmethod: given an observed time series x1, ..., xn, estimate θ by

θ = argmaxθ∈Θ

h(x1, ..., xn; θ) (4.45)

where h is the joint density function of (X1, ..., Xn). If observations are dis-crete, then h is the joint probability P (X1 = x1, ..., Xn = xn). Equivalently,we may maximize the log-likelihood L(x1, ..., xn; θ) = log h(x1, ..., xn; θ).Under fairly general regularity conditions, θ is asymptotically consistent, inthe sense that it converges in probabilty to θo. In other words, limn→∞ P (|θ−θo| > ε) = 0 for all ε > 0. In the case of a Gaussian time series with spectraldensity fX(λ; θ), we have

L(x1, ..., xn; θ) = −12[log 2π + log |Σn|+ (x−x)tΣ−1

n (x−x)] (4.46)

where x = (x1, ..., xn)t, x = x · (1, 1, ..., 1)t, and |Σn| is the determinant ofthe covariance matrix of (X1, ..., Xn)t with elements [Σn]ij = cov(Xi, Xj).Since under general conditions n−1 log |Σn| converges to (2π)−1 times theintegral of log fX (Grenander and Szego 1958), and the (j, l)th element ofΣ−1n can be approximated by

∫f−1X (λ) exp{i(j− l)λ}dλ, an approximation

to θ can be obtained by the so-called Whittle estimator θ (Whittle 1953;also see e.g. Fox and Taqqu 1986, Dahlhaus 1987) that minimizes

Ln(θ) =14π

∫ π

−π[log fX(λ; θ) +

I(λ)fX(λ; θ)

]dλ (4.47)

An alternative approximation for Gaussian processes is obtained by usingan autoregressive representation of the type Xt =

∑∞j=1 bjXt−j + εt, where

εt are independent identically distributed zero mean normal variables withvariance σ2

ε . This leads to minimizing the sum of the squared residuals asexplained below in Equation (4.50) (see e.g. Box and Jenkins 1970, Beran1995).

©2004 CRC Press LLC

Page 122: Statistics in Musicology

In general, the actual mathematical and practical difficulty lies in defin-ing a computationally feasible estimation procedure and also to obtainthe asymptotic distribution of θ. There is a large variety of models forwhich this has been achieved. Most results are known for linear modelsXt =

∑ψjεt−j with iid εt. (All examples given in the previous section are

linear.) The reason is that, if the distribution of εt is known, then the distri-bution of the process can be recovered by looking at the autocovariances, orequivalently the spectral density, only. Furthermore, if Xt is invertible, i.e.if Xt can be written as Xt =

∑∞k=1 ϕkXt−k + εt, then θo can be estimated

by maximizing the loglikelihood of the independent variables εt:

θ = argmaxθ∈Θ

n∑t=1

log hε(et(θ)) (4.48)

where hε is the probability density of ε and et(θ) = xt −∑∞k=1 ϕkxt−k.

For a finite sample, et(θ) is approximated by et(θ) = xt−∑t−1k=1 ϕkxt−k. In

the simplest case where εt are normally distributed with hε(x)= (2πσ2ε)

− 12

exp{−x2/(2σ2e)} and θ = (σ2

ε , θ2, ..., θp) = (σ2ε , η), we have et(θ) = et(η)

and

θ = argminθ∈Θ

[n∑t=1

log σ2ε +

n∑t=1

(et(η)σε

)2

] (4.49)

Differentiating with respect to θ leads to

η = argminη

n∑t=1

e2t (η) (4.50)

and σ2ε = n−1

∑e2t (η). Under mild regularity conditions, as n tends to

infinity, the distribution of√n(θ−θ) tends to a normal distributionN(0, V )

with with covariance matrix V = 2B−1 where B is a p × p matrix withelements

Bij = (2π)−1

∫ π

−π

∂θilog f(λ; θ)

∂θjlog f(λ; θ)dλ

(see e.g. Box and Jenkins 1970, Beran 1995).The estimation method above assumes that the order of the model, i.e.

the length p of the parameter vector θ, is known. This is not the case ingeneral so that p has to be estimated from data. Information theoretic con-siderations (based on definitions discussed in Section 3.1) lead to Akaike’sfamous criterion (AIC; Akaike 1973a,b)

p = argminp

{−2 log likelihood + 2p} (4.51)

More generally, we may minimize AICα = −2 log likelihood + αk with re-spect to p. This includes the AIC (α = 2), the BIC (Bayesian informationcriterion, Schwarz 1978, Akaike 1979) with α = logn and the HIC (Han-

©2004 CRC Press LLC

Page 123: Statistics in Musicology

nan and Quinn 1979) with α = 2c log logn (c > 1). It can be shown that, ifthe observed process is indeed generated by a process from the postulatedclass of models, and if its order is po, then for α ≥ O(2c log logn) the esti-mated order is asymptotically correct with probability one. In contrast, ifα/(2c log logn) → 0 as n → ∞, then the criterion tends to choose too manyparameters in the sense that P (p > po) converges to a positive probability.This is, for instance, the case for Akaike’s criterion. Thus, if identificationof a correct model is the aim, and the observed process is indeed likely to beat least very close to the postulated model class, then α ≥ O(2c log log n)should be used. On the other hand, one may argue that no model is evercorrect, so that increasing the number of parameters with increasing sam-ple size may be the right approach. In this case, the original AIC is a goodcandidate. It should be noted, however, that if p → ∞ as n → ∞, thenthe asymptotic distribution and even the rate of convergence of θ changes,since this is a kind of nonparametric modeling with an ultimately infinitedimensional parameter.

4.2.6 Fitting non- and semiparametric models

Most techniques for fitting nonparametric models rely on smoothing, com-bined with additional estimation of parameters needed for fine tuning ofthe smoothing procedure. To illustrate this, consider for instance,

(1−B)mXt = g(st) + Ut (4.52)

as defined above where Ut is second order stationary and st = t/n. If m isknown, then g may be estimated, for instance, by a kernel smoother

g(to) =1nb

n∑t=1

K(st − sto

b)yt (4.53)

as defined in Chapter 2, with xt = (1 − B)mxt. However, results maydiffer considerably depending on the choice of the bandwidth b (see e.g.Gasser and Muller 1979, Beran and Feng 2002a,b). The optimal bandwidthdepends on the nature of the residual process Ut. A criterion for optimalityis, for instance, the integrated mean squared error

IMSE =∫

E{[g(s)− g(s)]2}ds.

The IMSE can be written as

IMSE =∫{E[g(s)]−g(s)}2ds+

∫var(g(s))ds =

∫{Bias2+variance}ds.

The Bias only depends on the function g, and is thus independent of theerror process. The variance, on the other hand, is a function of the co-variances γU (k) = cov(Ut, Ut+k), or equivalently the spectral density fU .

©2004 CRC Press LLC

Page 124: Statistics in Musicology

The bandwidth that minimizes the IMSE thus depends on the unknownquantities g and fU . Both g and fU , therefore, have to be estimated simul-taneously in an iterative fashion. For instance, in a SEMIFAR model, theasymptotically optimal bandwidth can be shown to be equal to

bopt = Coptn(2δ−1)/(5−2δ)

where Copt is a constant that depends on the unknown parameter vectorθ = (σ2

ε , d, ϕ1, ..., ϕp)t. Note that in this case, m is also part of the un-known vector. An algorithm for estimating g as well as θ can be defined bystarting with an initial estimate of θ, calculating the corresponding optimalbandwidth, subtracting g from xt, reestimating θ, estimating the new op-timal bandwidth and so on. Note that in addition the order p is unknown,so that a model choice criterion has to be used at some stage. This com-plicates matters considerably, and special care has to be taken to definea reliable algorithm. Algorithms that work theoretically as well as prac-tically for reasonably small sample sizes are discussed in Beran and Feng(2002a,b).

4.2.7 Spectral estimation

Sometimes one is only interested in the spectral density fX of a stationaryprocess or, equivalently, the autocovariances γX(k), without modeling thewhole distribution of the time series. The reason can be, for instance, thatas discussed above, one may be mainly interested in (random) periodicitieswhich are identifiable as peaks in the spectral density.A natural nonparametric estimate of γX(k) is the sample autocovariance

γ(k) =1n

n−k∑t=1

(xt − x)(xt+k − x) (4.54)

for k ≥ 0 and γ(−k) = γ(k). The corresponding estimate of fX is theperiodogram

I(λ) =12π

n−1∑k=−(n−1)

γ(k)eikλ =1

2πn|n∑t=1

(xt − x)eitλ|2 (4.55)

Sometimes a so-called tapered periodogram is used:

Iw(λ) = (2πn)−1|n∑t=1

w(t

n)(xt − x)eitλ|2

where w is a weight function. It can be shown that E[I(λ)] → fX(λ) asn → ∞. However, for lags close to n−1, γ(k) is very inaccurate, because oneaverages over n−k observed pairs only. For instance, for k = n−1, there isonly one observed pair, namely (x1, xn), with this lag! As a result, I(λ) does

©2004 CRC Press LLC

Page 125: Statistics in Musicology

not converge to fX(λ). Instead, the following holds, under mild regularityconditions: if 0 < λ1 < ... < λk < π, and n → ∞, then, as n → ∞,the distribution of 2 · [I(λ1)/fX(λ1), ..., 2I(λk)/fX(λk)] converges to thedistribution of (Z1, ..., Zk) where Zi are independent χ2

2-distributed randomvariables. This result is also true for sequences of frequencies 0 < λ1,n <... < λk,n < π as long as the smallest distance between the frequencies,min |λi,n − λj,n| does not converge to zero faster than n−1. Because of thelatter condition, and also for computational reasons (fast Fourier transform,FFT; see Cooley and Tukey 1965, Bringham 1988), one usually calculatesI(λ) at the so-called Fourier frequencies λj = 2πj/n (j = 1, ...,m) withm = [(n− 1)/2]) only. Note that for Fourier frequencies,

∑nt=1 eitλj = 0, so

that theI(λ) = (2πn)−1|

∑xte

itλ|2.Thus, the sample mean actually does not need to be subtracted. The peri-odogram at Fourier frequencies can also be understood as a decompositionof the variance into orthogonal components, analogous to classical analysisof variance (Scheffe 1959): for n odd,

n∑t=1

(xt − x)2 = 4πm∑j=2

I(λj) (4.56)

and for n even,n∑t=1

(xt − x)2 = 4πm∑j=2

I(λj) + 2πI(π). (4.57)

This means that I(λj) corresponds to the (empirically observed) contribu-tion of periodic components with frequency λj to the overall variability ofx1, ..., xn.A consistent estimate of fX can be obtained by eliminating or down-

weighing sample autocovariances with too large lags:

f(λ) =12π

n−1∑k=−(n−1)

wn(k)γ(k)eikλ (4.58)

where wn(k) = 0 (or becomes negligible) for k > Mn, with Mn/n → 0 andMn → ∞. Equivalently, one can define a smoothed periodogram

f(λ) =∫

Wn(ν − λ)I(ν)dν (4.59)

for a suitable sequence of window functionsWn such that∫Wn(ν−λ)f(ν)dν

converges to f(λ) as n → ∞. See e.g. Priestley (1981) for a detailed dis-cussion.Finally, it should be noted that, in spite of inconsistency, the raw peri-

odogram is very useful for finding periodicities. In particular, in the case

©2004 CRC Press LLC

Page 126: Statistics in Musicology

of deterministic periodicities with frequencies ωj , I(λ) diverges to infinityfor λ = ωj and remains finite (proportional to a χ2

2−variable) elsewhere.

4.2.8 The harmonic regression model

An important approach to analyzing musical sounds is the harmonic re-gression model

Xt =p∑j=1

[αj cosωjt+ βj sinωjt] + Ut (4.60)

with Ut stationary. Note that, theoretically, this model can also be un-derstood as a stationary process with jumps in the spectral distributionFX (see Section 4.2.1). Given ω = (ω1, ..., ωp)t, the parameter vector θ =(α1, ..., αp, β1, ..., βp)t can be estimated by the least squares or, more gen-erally, weighted least squares method,

θ = argminθ

n∑t=1

w(t

n)[xt −

p∑j=1

(αj cosωjt+ βj sinωjt)]2 (4.61)

where w is a weight function. The solution is obtained from usual linearregression formulas. In many applications the situation is more complex,since the frequencies ω1, ..., ωp are also unknown. This leads to a nonlinearregression problem. A simple approximate solution can be given by (Walker1971, Hannan 1973, Hassan 1982, Brown 1990, Quinn and Thomson 1991)

ω = arg max0<ω1,...,ωp≤π

p∑j=1

|n∑t=1

w(t

n)xteiωjt|2 = argmax

ω

p∑j=1

Iw(ωj), (4.62)

αj =∑nt=1 w( tn )xt cos ωjt∑n

t=1 w( tn ), (4.63)

and

βj =∑nt=1 w( tn )xt sin ωjt∑n

t=1 w( tn )(4.64)

Note that (4.64) means that we look for the k largest peaks in the (w-tapered) periodogram. Under quite general assumptions, the asymptoticdistribution of the estimates can be shown to be as follows: the vectors

Zn,j = [√n(αj − αj),

√n(βj − βj), n

32 (ωj − ωj)]t

(j = 1, ..., p) are asymptotically mutually independent, each having a 3-dimensional normal distribution with expected value zero and covariancematrix C(ωj) that depends on fU (ωj) and the weight function w. Theformulas for C are as follows (Irizarry 1998, 2000, 2001, 2002):

C(ωj) =4πfU (ωj)α2j + β2

j

V (ωj) (4.65)

©2004 CRC Press LLC

Page 127: Statistics in Musicology

where

V (ωj) =

c1α

2j + c2β

2j −c3αjβj −c4βj

−c3αjβj c2α2j + c1β

2j c4αj

−c4βj c4αj co

, (4.66)

co = aobo, c1 = UoW−2o , c2 = aob1, (4.67)

c3 = aoW1W−2o (W 2

oW1U2 −W 31 Uo − 2W 2

oW2U1 + 2WoW1W2Uo), (4.68)

c4 = ao(WoW1U2 −W 21 U1 −WoW2U1 +W1W2Uo), (4.69)

ao = (WoW2 −W 21 )

−2, (4.70)

bn = W 2nU2 +Wn+1(Wn+1Uo − 2WnU1) (n = 0, 1), (4.71)

Un =∫ 1

o

snw2(s)ds, (4.72)

Wn =∫ 1

o

snw(s)ds. (4.73)

This result can be used to obtain tests and confidence intervals for αj , βjand ωj (j = 1, 2, ..., p), with the unknown quantities αj , βj and fU (ωj) thenreplaced by estimates. Note that this involves, in particular, estimation ofthe spectral density of the residual process Ut.A quantity that is of particular interest is the difference between the

fundamental frequency ω1 and partials j · ω1,

∆j = ωj − j · ω1. (4.74)

For many musical instruments, this difference is exactly or approximatelyequal to zero. The asymptotic distribution given above can be used to testthe null hypothesis Ho : ∆j = 0 or to construct confidence intervals for ∆j .

More specifically, n32 (∆j − ∆j) is asymptotically normal with zero mean

and variance

v∆ = 4πco

[fU (ωj)α2j + β2

j

+j2fU (ω1)α2

1 + β21

]. (4.75)

This can be generalized to any hypothesized relationship ∆j = ωj− g(j)ω1

(see the example of a guitar mentioned in the next section).

4.2.9 Dominating frequencies in random series

In the harmonic regression model, the main signal consists of determin-istic periodic functions. For less harmonic “noisy” signals, a weaker form

©2004 CRC Press LLC

Page 128: Statistics in Musicology

of periodicity may be observed. This can be modeled by a purely randomprocess whose mth difference Yt = (1 −B)mXt is stationary (m = 0, 1, ...)with a spectral density f that has distinct local maxima. Estimation oflocal maxima and identification of the corresponding frequencies is consid-ered, for instance, in Newton and Pagano (1983) and Beran and Ghosh(2000). Beran and Ghosh (2000) consider the case where Yt is a fractionalARIMA(p, d, 0) process of unknown order p. Suppose we want to esti-mate the frequency ωmax where f assumes the largest local maximum. In afirst step, the parameter vector θ = (σ2

ε , d, φ1, ..., φp) (with d = δ + m)is estimated by maximum likelihood and p is chosen by the BIC. Letθ∗ = (σ2

ε , δ, θ3, ..., θp+2) = (σ2ε , η

∗) and

f(λ; θ∗) =σ2ε

2π|φ(eiλ)|−2|1− eiλ|−2δ =

σ2ε

2πg(λ; η∗) (4.76)

be the spectral density of Yt. Then Yt = (1−B)mXt and ωmax is set equalto the frequency where the estimated spectral density f(λ; θ∗) assumes itsmaximum. Define

Vp(η∗) = 2W−1 (4.77)

where

Wij = (2π)−1[∫ π

−π

∂uilog g(x;u)

∂ujlog g(x;u)dx]|u=η∗ , (i, j = 1, ..., p+1).

(4.78)Then, as n → ∞,

√n(ωmax − ωmax) →d N(0, τp) (4.79)

with

τp = τp(η∗) =1

[g′′(ωmax, η∗)]2[g

′(ωmax, η∗)]TVp(η∗)[g

′(ωmax, η∗)] (4.80)

where →d denotes convergence in distribution, g′, g′′ denotes derivativeswith respect to frequency and g with respect to the parameter vector. Notein particular that the order of var(ωmax) is n−1 whereas in the harmonicregression model the frequency estimates have variances of the order n−3.The reason is that a deterministic periodic signal is a much stronger formof periodicity and is therefore easier to identify.

4.3 Specific applications in music

4.3.1 Analysis and modeling of musical instruments

There is an abundance of literature on mathematical modeling of soundsignals produced by musical instruments. Since a musical instrument is avery complex physical system, even if conditions are kept fixed, not only de-terministic but also statistical models are important. In addition to that,

©2004 CRC Press LLC

Page 129: Statistics in Musicology

various factors can play a role. For instance, the sound of a violin de-pends on the wood it is made of, which manufacturing procedure was used,current atmospheric conditions (temperature, humidity, air pressure), whoplays the violin, which particular notes are played in which context, etc.The standard approach that makes modeling feasible is to think of a soundas the result of harmonic components that may change slowly in time, plus“noise” components that may be described by random models. It should benoted, however, that sound is not only produced by an instrument but alsoperceived by the human ear and brain. Thus, when dealing with the “signif-icance” or “effect” of sounds, physiology, psychology and related scientificdisciplines come into play. Here, we are first concerned with the actual “ob-jective” modeling of the physical sound wave. This is a formidable task onits own, and far from being solved in a satisfactory manner.The scientific study of musical sound signals by physical equations goes

back to the 19th century. Helmholtz (1863) proved experimentally thatmusical sound signals are mainly composed of frequency components thatare multiples of a fundamental frequency (also see Raleigh 1894). Ohmconjectured that the human ear perceives sounds by analyzing the powerspectrum (i.e. essentially the periodogram), without taking into account rel-ative phases of the sounds. These conjectures have been mostly confirmedby psychological and physiological experiments (see e.g. Grey 1977, Pierce1983/1992). Recent mathematical models of instrumental sound waves (seee.g. Fletcher and Rossing 1991) lead to the assumption that, for short timesegments, a musical sound signal is stationary and can be written as aharmonic regression model with ω1 < ω2 < ... < ωp. To analyze a musicalsound wave, one therefore can divide time into small blocks and fit theharmonic regression model as described above. The lowest frequency ω1 iscalled the fundamental frequency and corresponds to what one calls “pitch”in music. The higher frequencies ωj (j ≥ 2) are called partials, overtones,or harmonics. The amplitudes of partials, and how they change gradually,are main factors in determining the “timbre” of a sound. For illustration,Figure 4.1 shows the sound wave (air pressure amplitudes) of a piano dur-ing 1.9 seconds where first a c′ and then an f ′ are played. The signalwas sampled in 16-bit format at a sampling rate of 44100 Hz. This corre-sponds to CD-quality and means that every second, 44100 measurementsof the sound wave were taken, each of the measurements taking an integervalue between −32768 to 32767 (32767+32768+1=216). Figure 4.2 showsan enlarged picture of the shaded area in Figure 4.1 (2050 measurements,corresponding to 0.046 seconds). The periodogram (in log-coordinates) ofthis subseries is plotted in Figure 4.3. The largest peak occurs approxi-mately at the fundamental frequency ω1 = 441 ·2−9/12 ≈ 262.22 of c′. Notethat, since the periodogram is calculated at Fourier frequencies only, ω1

cannot be identified exactly (see also the remarks below). A small numberof partials ωj (j ≥ 2) can also be seen in Figure 4.3 the contribution of

©2004 CRC Press LLC

Page 130: Statistics in Musicology

Figure 4.1 Sound wave of c′ and f ′ played on a piano.

higher partials is however relatively small. In contrast, the periodogram ofe′′J played on a harpsichord shows a large number of distinctly importantpartials (Figures 4.4, 4.5). There is obviously a clear difference betweenpiano and harpsichord in terms of amplitudes of higher partials. A com-prehensive study of instrumental or vocal sounds also needs to take intoaccount different techniques in which an instrument is played, and otherfactors such as the particular pitch ω1 that is played. This would, however,be beyond the scope of this introductory chapter.A specific component that is important for “timbre” is the way in which

the coefficients αj , βj change in time (see e.g. Risset and Mathews 1969).Readers familiar with synthesizers may recall “envelopes” that are con-trolled by parameters such as “attack” and “delay”. The development of αj ,βj can be studied by calculating the periodogram for a moving time-windowand plotting its values against time and frequency in a 3-dimensional orimage plot. Thus, we plot the local periodogram (in this context also called

©2004 CRC Press LLC

Page 131: Statistics in Musicology

time in seconds

am

plit

ud

e

1

0

10

^3

c’ played by piano (shaded area from figure 4.1)

Figure 4.2 Zoomed piano sound wave – shaded area in Figure 4.1.

spectrogram)

I(t, λ) =1

2π∑nj=1 W 2( t−jnb )

|n∑j=1

W (t− j

nb)e−iλjxj |2 (4.81)

where W : R → R+ is a weight function such that W (u) = 0 for |u| > 1 andb > 0 is a bandwidth that determines how large the window (block) is, i.e.how many consecutive observations are considered to correspond approx-imately to a harmonic regression model with fixed coefficients αj , βj andstationary noise Ut. This is illustrated in color Figure 4.7 for a harpsichordsound, with W (u) = 1{|u| ≤ 1}. Intense pink corresponds to high values ofI(t, λ). Figures 4.6a through d show explicitly the change in I(t, λ) betweenfour different blocks. Since the note was played “staccato”, the sound waveis very short, namely about 0.1 seconds. Nevertheless, there is a change inthe spectrum of the sound, with some of the higher harmonics fading away.Apart from the relative amplitudes of partials, most musical sounds in-

©2004 CRC Press LLC

Page 132: Statistics in Musicology

Figure 4.3 Periodogram of piano sound wave in Figure 4.2.

time in seconds

am

plit

ude

0.0 0.01 0.02 0.03 0.04

-3000

-2000

-1000

01000

2000

3000

Sound wave of e’’ flat played by harpsichord (0.25sec at sampling rate=44100 Hz)

Figure 4.4 Sound wave of e′′� played on a harpsichord.

©2004 CRC Press LLC

Page 133: Statistics in Musicology

Figure 4.5 Periodogram of harpsichord sound wave in Figure 4.4.

frequency

periodogra

m

0.0 0.5 1.0 1.5 2.0 2.5 3.0

10^1

10^3

10^5

10^7

Harpsichord - Periodogram (block 1)

afrequency

periodogra

m

0.0 0.5 1.0 1.5 2.0 2.5 3.0

10^0

10^2

10^4

10^6

Harpsichord - Periodogram (block 22)

a

frequency

pe

rio

do

gra

m

0.0 0.5 1.0 1.5 2.0 2.5 3.0

10

^01

0^2

10

^41

0^6

Harpsichord - Periodogram (block 42)

bfrequency

pe

rio

do

gra

m

0.0 0.5 1.0 1.5 2.0 2.5 3.0

10^0

10

^21

0^4

10

^6

Harpsichord - Periodogram (block 62)

c

Figure 4.6 Harpsichord sound – periodogram plots for different time frames (mov-ing windows of time points).

©2004 CRC Press LLC

Page 134: Statistics in Musicology

Figure 4.7 A harpsichord sound and its spectrogram. Intense pink corresponds tohigh values of I(t, λ). (Color figures follow page 152.)

clude a characteristic nonperiodic noise component. This is a further jus-tification, apart from possible measurement errors, to include a randomdeviation part in the harmonic regression equation. The properties of thestochastic process Ut are believed to be characteristic for specific instru-ments (see e.g. Serra and Smith 1991, Rodet 1997). Typical noise compo-nents are, for instance, transient noise in percussive instruments, breathnoise in wind instruments, or bow noise of string instruments. For a dis-cussion of statistical issues in this context see e.g. Irizarry (2001). For mostinstruments, not only the harmonic amplitudes but also the characteris-tics of the noise component change gradually. This may be modeled bysmoothly changing processes as defined for instance in Ghosh et al. (1997).Other approaches are discussed in Priestley (1965) and Dahlhaus (1996a,b,1997) (see Section 4.2.1 above).Some interesting applications of the asymptotic results in Section 4.2.8 to

questions arising in the analysis of musical sounds are discussed in Irizarry

©2004 CRC Press LLC

Page 135: Statistics in Musicology

(2001). In particular, the following experiment is described: recordings ofa professional clarinet player trying to play concert pitch A (ω1 = 441Hz)and a professional guitar player playing D (ω1 = 146.8Hz) were made. Forthe analysis of the clarinet sound, a one-second segment was divided intonon-overlapping blocks consisting of 1025 measurements (≈23 milliseconds)and the harmonic regression model was fitted to each block separately. Forthe guitar, the same was done with 60 non-overlapping intervals with 3000observations each. Two types of results were obtained:

1. The clarinet player turned out to be always out of tune in the sense thatthe estimated fundamental frequency ω1 was always outside the 95%-acceptance region 441Hz±1.96

√C33(ωo1)n

− 32 where the null hypothesis

is Ho : ω1 = ωo1 = 441Hz. On the other hand, from the point of view ofmusical perception, the clarinet player was not out of tune, because thedeviation from 441Hz was less than 0.76Hz which corresponds to 0.03semitones. According to experimental studies, the human ear cannotdistinguish notes that are 0.03 semitones apart (Pierce 1983/1992).

2. Physical models (see e.g. Fletcher and Rossing 1991) postulate the fol-lowing relationships between the fundamental frequency and partials:for a “harmonic instrument” such as the clarinet, one expects

ωj = j · ω1,

whereas for a “plucked string instrument”, such as the guitar, one shouldhave

ωj ≈ cj2 · ω1

where c is a constant determined by properties of the strings. The ex-periment described in Irizarry (2001) supports the assumption for theclarinet in the sense that, in general, the 95%-confidence intervals forthe difference ωj − jω1 contained 0. For the guitar, his findings suggesta relationship of the form ωj ≈ c(a+ j)2ω1 with a = 0.

4.3.2 Licklider’s theory of pitch perception

Thumfart (1995) uses the theory of discrete evolutionary spectra to derivea simple linear model for pitch perception as proposed by Licklider (1951).The general biological background is as follows (see e.g. Kelly 1991): vibra-tions of the ear drum caused by sound waves are transferred to the innerear (cochlea) by three ossicles in the middle ear. The inner ear is a spiralstructure that is partitioned along its length by the basilar membrane. Thesound wave causes a traveling wave on the basilar membrane which in turncauses hair cells positioned at different locations to release a chemical trans-mitter. The chemical transmitter generates nerve impulses to the auditorynerve. At which location on the membrane the highest amplitude occurs,and thus which groups of hair cells are activated, depends on the frequency

©2004 CRC Press LLC

Page 136: Statistics in Musicology

of the sound wave. This means that certain frequency regions correspondto certain hair groups. Frequency bands with high spectral density f (orhigh increments dF of the spectral distribution) activate the associatedhair groups.To obtain a simple model for the effect of a sound on the basilar mem-

brane movement, Slaney and Lyon (1991) partition the cochlea into 86sections, each section corresponding to a particular group of cells. Thum-fart (1995) assumes that each group of cells acts like a separate linear filterΨj (j = 1, ..., 86). (This is a simplification compared to Slaney and Lyonwho use nonlinear models.) The wave entering the inner ear is assumed tobe the original sound wave Xt, filtered by the outer ear by a linear filterA1, and the middle ear by a linear A2. Thus, the output of the inner earthat generates the final nerve impulses consists of 86 time series

Yt,j = Ψj(B)A2(B)A1(B)Xt (j = 1, ..., 86). (4.82)

Calculating tapered local periodograms Ij(u, λ) of Yt,j for each of the 86sections (j = 1, ..., 86), one can then define the quantity

c(k, j, u) =∫ π

−πIj(u, λ)eikλdλ (4.83)

which Slaney and Lyon call “correlogram”. This is in fact an estimated localautocovariance at lag k for section j and the time-segment with midpointu. The “Slaney-Lyon-correlogram” thus essentially characterizes the localautocovariance structure of the resulting nerve impulse series. Thumfart(1995) shows formally how, and under which conditions, this model canbe defined within the framework of processes with a discrete evolutionaryspectrum. He also suggests a simple method for estimating pitch (the fun-damental frequency) at local time u by setting ω1(u) = 2π/kmax(u) wherekmax(u) = argmaxk C(k, u) and C(k, u) =

∑86j=1 c(k, j, u).

4.3.3 Identification of pitch, tone separation and purity of intonation

In a recent study, Weihs et al. (2001) investigate objective criteria for judg-ing the quality of singing (also see Ligges et al. 2002). The main questionasked in their analysis is how to assess purity of intonation. In an ex-perimental setting, with standardized playback piano accompaniment in arecording studio, 17 singers were asked to sing Handel’s “Tochter Zion” andBeethoven’s “Ehre Gottes aus der Natur”. The audio signal of the vocalperformance was recorded in CD quality in 16-bit format at a sampling rateof 44100 Hz. For the actual statistical analysis, data is reduced to 11000Hz,for computational reasons, and standardized to the interval [-1,1].The first question is how to identify the fundamental frequency (pitch)

ω1. In the harmonic regression model above, estimates of ω1 and the par-tials ωj (2 ≤ j ≤ k) are identical with the k frequencies where the pe-

©2004 CRC Press LLC

Page 137: Statistics in Musicology

riodogram assumes its k largest values. Weihs et al. suggest a simplified(though clearly suboptimal) version of this, in that they consider the peri-odogram at Fourier frequencies λj = 2πj/n (j = 1, 2, ...,m = [(n − 1)/2])only and set

ω1 = minλj∈{λ2,...,λm− 1 }

{λj : I(λj) > max[I(λj−1), I(λj+1)]}. (4.84)

In other words, ω1 corresponds to the Fourier frequency where the firstpeak of the periodogram occurs. Because of the restriction to Fourier fre-quencies, the peridogram may have two adjacent peaks and the estimate istoo inaccurate in general. An empirical interpolation formula is suggestedby the authors to obtain an improved estimate ω1. A comparison with har-monic regression is not made, however, so that it is not clear how good theinterpolation works in comparison.Given a procedure for pitch identification, an automatic note separation

procedure can be defined. This is a procedure that identifies time points ina sound signal where a new note starts. The interesting result in Weihs etal. is that automatic note separation works better for amateur singers thanfor professionals. The reason may be the absence of vibrato in amateurvoices. In a third step, Weihs et al. address the question of how to as-sess computationally the purity of intonation based on a vocal time series.This is done using discriminant analysis. The discussion of these results istherefore postponed to Chapter 9.

4.3.4 Music as 1/f noise?

In the 1970s Voss and Clarke (1975, 1978) discovered a seemingly universal“law” according to which music has a 1/f spectrum. With 1/f -spectrumone means that the observed process has a spectral density f such thatf(λ) ∝ λ−1 as λ → 0. In the sense of definition (4.10), such a densityactually does not exist - however, a generalized version of spectral densityexists in the sense that the expected value of the periodogram convergesto this function (see Matheron 1973, Solo 1992, Hurvich and Ray 1995).Specifically, Voss and Clarke analyzed acoustic music signals by first trans-forming the recorded signal Xt in the following way: a) Xt is filtered by alow-pass filter (frequencies outside the interval [10Hz, 10000Hz] are elim-inated); and b) the “instantaneous power” Yt = X2

t is filtered by anotherlow-pass filter (frequencies above 20Hz are eliminated). This filtering tech-nique essentially removes higher frequencies but retains the overall shape(or envelope) of each sound wave corresponding to a note and the relativeposition on the onset axis. In this sense, Voss and Clarke actually analyzedrhythmic structures. A recent, statistically more sophisticated study alongthis line is described in Brillinger and Irizarry (1998).One objection to this approach can be that in acoustic signals, structural

©2004 CRC Press LLC

Page 138: Statistics in Musicology

time (sec)

air p

ressure

0.0 0.02 0.04 0.06 0.08 0.10 0.12

-30

00

-1

00

0

10

00

3

00

0

a) Harpsichord sound wave (e flat) sampled at 44100 Hz

time (sec)

log(p

ow

er)

0.0 0.02 0.04 0.06 0.08 0.10 0.12

13

14

15

16

17

18

b) Harpsichord - log(power)

13 14 15 16 17 18

0 20

4060

8010

0

log(y**2)

c) Harpsichord - histogram of log(power)

log(frequency)

log(f

)

0.01 0.05 0.10 0.50 1.00

0.0

001

0.0

100

1.0

000

d) Harpsichord - log-log-periodogram and SEMIFAR-fit (d=0.51)

Figure 4.8 A harpsichord sound wave (a), logarithm of squared amplitudes (b),histogram of the series (c) and its periodogram on log-scale (d) together with fittedSEMIFAR-spectrum.

properties of the composition may be confounded with those of the instru-ments. Consider, for instance, the harpsichord sound wave in Figure 4.8a.The square of the wave is displayed in Figure 4.8b on logarithmic scale.The picture illustrates that, apart from obvious oscillation, the (envelopeof the) signal changes slowly. Fitting a SEMIFAR-model (with order p ≤ 8chosen by the BIC) yields a good fit to the periodogram. The estimatedfractional differencing parameter is d = 0.51 with a 95%-confidence intervalof [0.29,0.72]. This corresponds to a spectral density (defined in the gener-alized sense above) that is proportional to λ−1.02, or approximately λ−1.Thus, even in a composition consisting of one single note one would detect1/f noise in the resulting sound wave.Instead of recorded sound waves, we therefore consider the score itself,

independently of which instrument is supposed to play. This is similar butnot identical to considering zero crossings of a sound signal (see Voss and

©2004 CRC Press LLC

Page 139: Statistics in Musicology

Clarke 1975, 1978, Voss 1988; Brillinger and Irizarry 1998). Figures 4.9a andc show the log-frequencies plotted against onset time for the first movementof Bach’s first Cello-Suite and for Paganini’s Capriccio No. 24. For Bach, theSEMIFAR-fit yields d ≈ 0.7 with a 95%-confidence interval of [0.46, 0.93].This corresponds to a 1/f1.4 spectrum; however 1/f (d = 1/2) is includedin the confidence interval. Thus, there is not enough evidence against the1/f hypothesis. In contrast, for Paganini (Figure 4.11) we obtain d ≈ 0.21with a 95%-confidence interval of [0.07, 0.35] which excludes 1/f noise. Thisindicates that there is a larger variety of fractal behavior than the “1/f -law” would suggest. Note also that in both cases there is also a trend inthe data which is in fact an even stronger type of long memory than thestochastic one. Moreover, Bach’s (and also to a lesser degree Paganini’s)spectrum has local maxima in the spectral density, indicating periodicities(see Section 4.2.9). Thus, there is no “pure” 1/fα behavior but insteada mixture of long-range dependence expressed by the power law near theorigin, and short-range periodicities.

Figure 4.9 Log-frequencies with fitted SEMIFAR-trend and log-log-periodogramtogether with SEMIFAR-fit for Bach’s first Cello Suite (1st movement; a,b) andPaganini’s Capriccio No. 24 (c,d) respectively.

Finally, consider an alternative quantity, namely local variability of notesmodulo octave. Since we are in Z12, a measure of variability for circulardata should be used. Here, we use the measure V = (1 − R) as defined inChapter 7 or rather the transformed variable log[(V +0.05)/(1.05−V )]. Theresulting standardized time series are displayed in Figures 4.10a and c. Thelog-log-plot of the periodgrams and fitted SEMIFAR-spectra are given inFigures 4.10b and d respectively. The estimated long-memory parameters

©2004 CRC Press LLC

Page 140: Statistics in Musicology

Figure 4.10 Local variability with fitted SEMIFAR-trend and log-log-periodogramtogether with SEMIFAR-fit for Bach’s first Cello Suite (1st movement; a,b) andPaganini’s Capriccio No. 24 (c,d) respectively.

are similar to before, namely d = 0.51 ([0.20, 0.81]) for Bach and 0.33([0.24, 0.42]) for Paganini.

©2004 CRC Press LLC

Page 141: Statistics in Musicology

Figure 4.11 Niccolo Paganini (1782-1840). (Courtesy of ZentralbibliothekZurich.)

©2004 CRC Press LLC

Page 142: Statistics in Musicology

CHAPTER 5

Hierarchical metho ds

5 . 1 M us i cal m o ti vati o n

Musical structures are typically generated in a hierarchical manner. Mostcompositions can be divided approximately into natural segments (e.g.movements of a sonata); these are again divided into smaller units (e.g.exposition, development, and coda of a sonata movement). These can againbe divided into smaller parts (e.g. melodic phrases), and so on. Differentparts even at the same hierarchical level need not be disjoint. For instance,different melodic lines may overlap. Moreover, different parts are usuallyclosely related within and across levels. A general mathematical approachto understanding the vast variety of possibilities can be obtained, for in-stance, by considering a hierarchy of maps defined in terms of a manifold(see e.g. Mazzola 1990a). The concept of hierarchical relationships and sim-ilarities is also related to “self-similarity” and fractals as defined in Mandel-brot (1977) (see Chapter 3). To obtain more concrete results, hierarchicalregression models have been developed in the last few years (Beran andMazzola 1999a,b, 2000, 2001).

5.2 Basic principles

5.2.1 Hierarchical aggregation and decomposition

Suppose that we have two time series Yt, Xt and we wish to model the re-latioship between Yt and Xt. The simplest model is simple linear regression

Yt = βo + β1Xt + εt (5.1)

where εt is a stationary zero mean process independent of Xt. If Yt and Xt

are expected to be “hierarchical”, then we may hope to find a more realisticmodel by first decomposing Xt (and possibly also Yt) and searching fordependence structures between Yt (or its components) and the componentsof Xt. Thus, given a decomposition Xt = Xt,1+ ...+Xt,M , we consider themultiple regression model

Yt = βo +M∑j=1

βjXt,j + εt (5.2)

©2004 CRC Press LLC

Page 143: Statistics in Musicology

with εt second order stationary and E(εt) = 0. Alternatively, if Yt = Yt,1+...+ Yt,L, we may consider a system of L regressions

Yt,1 = β01 +M∑j=1

βj1Xt,j + εt,1

Yt,2 = β02 +M∑j=1

βj2Xt,j + εt,2

...

Yt,L = β0L +M∑j=1

βjLXt,j + εt,L.

Three methods of hierarchical regression based on decompositions will bediscussed here: HIREG: hierarchical regression using explanatory variablesobtained by kernel smoothing with predetermined fixed bandwidths; HIS-MOOTH: hierarchical smoothing models with automatic bandwidth selec-tion; HIWAVE: hierarchical wavelet models.

5.2.2 Hierarchical regression

Given an explanatory time series Xt (t = 1, 2, ..., n), a smoothing kernelK, and a hierarchy of bandwidths b1 > b2 > ... > bM > 0, define

Xt,1 =1

nb1

n∑s=1

K(t− s

nb1)Xt (5.3)

and for 1 < j ≤ M ,

Xt,j =1

nbj

n∑s=1

K(t− s

nbj)[Xt −

j−1∑l=1

Xt,l] (5.4)

The collection of time series {X1,j, ..., Xn,j} (j = 1, ...,M) is called a hier-archical decomposition of Xt. The HIREG-model is then defined by (5.2).If εt (t = 1, 2, ...) are independent, then usual techniques of multiple linearregression can be used (see e.g. Plackett 1960, Rao 1973, Ryan 1996, Sri-vastava and Sen 1997, Draper and Smith 1998). In case of correlated errorsεt, appropriate adjustments of tests, confidence intervals, and parameterselection techniques must be made. The main assumption in the HIREGmodel is that we know which bandwidths to use. In some cases this mayindeed be true. For instance, if there is a three-fourth meter at the begin-ning of a musical score, then bandwidths that are divisible by three areplausible.

©2004 CRC Press LLC

Page 144: Statistics in Musicology

5.2.3 Hierarchical smoothing

Beran and Mazzola (1999b) consider the case where the bandwidths bjare not known a priori. Essentially, this amounts to a nonlinear regressionmodel Yt = βo +

∑Mj=1 βjXt,j + εt where not only βj (j = 0, ..., p) are un-

known, but also b1, ..., bM , and possibly the order M, have to be estimated.The following definition formalizes the idea (for simplicity it is given forthe case of one explanatory series Xt only):

Definition 40 For integers M,n > 0, let β = (β1, ..., βM ) ∈ RM , b =(b1, ..., bM ) ∈ RM , b1 > b2 > ... > bM = 0, ti ∈ [0, T ], 0 < T < ∞, t1 <t2 < ... < tn, and θ = (β, b)t. Denote by K : [0, 1] → R+ a non-negativesymmetric kernel function such that

∫K(u)du = 1, K is twice continuously

differentiable, and define for b > 0 and t ∈ [0, T ], the Nadaraya-Watsonweights (Nadaraya 1964, Watson 1964)

ab(t, ti) =K( t−tib )∑nj=1 K( t−tjb )

(5.5)

Also, let εi (i ∈ Z) be a stationary zero mean process satisfying suitablemoment conditions, fε the spectral density of εi, and assume εi to be in-dependent of Xi. Then the sequence of bivariate time series {(X1,n, Y1,n),..., (Xn,n, Yn,n)} (n = 1, 2, 3, ...) is a Hierarchical Smoothing Model (orHISMOOTH model), if

Yi,n = Y (ti) =M∑j=1

βjg(ti; bj) + εi (5.6)

where ti = i/n and

g(ti; bj) =n∑l=1

abj (ti, tl)Xl,n (5.7)

Denote by θo = (βo, bo)t the true parameter vector. Then θo can be esti-mated by a nonlinear least squares method as follows: define

ei(θ) = Y (ti)−M∑l=1

βjg(ti; bj) (5.8)

as a function of θ = (β, b)t, let S(θ) =∑ni=1 e2

i (θ) and g = ∂∂bg. Then

θ = argminθS(θ) (5.9)

or equivalentlyn∑i=1

ψ(ti, y; θ) = 0 (5.10)

©2004 CRC Press LLC

Page 145: Statistics in Musicology

where ψ = (ψ1, ..., ψ2M )t,

ψj(t, y; θ) = ei(θ)g(t; bj) (5.11)

for j = 1, ...,M, and

ψj(t, y; θ) = ei(θ)βj g(t; bj) (5.12)

for j = M+1, ..., 2M. Under suitable assumptions, the estimate θ is asymp-totically normal. More specifically, set

hi(t; θo) = g(t; bi) (i = 1, ...,M) (5.13)

hi(t; θo) = βig(t; bi) (i = M + 1, ..., 2M) (5.14)

Σ = [γε(i− j)]i,j=1,...,n = [cov(εi, εj)]i,j=1,...,n (5.15)

and define the 2M × n matrix

G = G2M×n = [hi(tj ; θo)]i=1,...,2M ;j=1,...,n (5.16)

and the 2M × 2M matrix

Vn = (GGt)−1(GΣGt)(GGt)−1 (5.17)

The following assumptions are sufficient to obtain asymptotic normality:

(A1) fε(λ) ∼ cf |λ|−2d (cf > 0) as λ → 0 with − 12 < d < 1

2 ;

(A2) Let

ar = n−1n∑

i,j=1

γε(i− j)g(ti; br)g(tj ; br),

br = n−1n∑

i,j=1

γε(i − j)g(ti; br)g(tj ; bs).

Then, as n → ∞, lim inf |ar| > 0, and lim inf |br| > 0 for all r, s ∈{1, ...,M}.

(A3) x(ti) = ξ(ti) where ξ : [0, T ]→ R is a function in C[0, T ], T < ∞.

(A4) The set of time points converges to a set A that is dense in [0, T ].

Then we have (Beran and Mazzola 1999b):

Theorem 12 Let Θ1 and Θ2 be compact subsets of R and R+ respectively,Θ = ΘM1 ×ΘM2 and let η = 1

2min{1, 1−2d}. Suppose that (A1), (A2), (A3)and (A4) hold and θo is in the interior of Θ. Then, as n → ∞,

(i) θ →p θo;

(ii) Vn → V where V is a symmetric positive definite 2M × 2M matrix;

(iii) nη(θ − θ) →d N(0, V ).

©2004 CRC Press LLC

Page 146: Statistics in Musicology

Thus, θ is asymptotically normal, but for d > 0 (i.e. long-memory errors),the rate of convergence n

12−d is slower than the usual n

12−rate.

A particular aspect of HISMOOTH models is that the bandwidths bj arefixed positive unknown parameters that are estimated from the data. Thismeans that, in contrast to nonparametric regression models (see e.g. Gasserand Muller 1979, Simonoff 1996, Bowman and Azzalini 1997, Eubank 1999),the notion of optimal bandwidth does not exist here. There is a fixed truebandwidth (or a vector of true bandwidths) that has to be estimated. AHISMOOTH model is in fact a semiparametric nonlinear regression ratherthan a nonparametric smoothing model.Theorem 1 can be interpreted as multiple linear regression where uncer-

tainty due to (explanatory) variable selection is taken into account. Theset of possible combinations of explanatory variables is parametrized by acontinuous bandwidth-parameter vector b ∈ ΘM2 . Confidence intervals for βbased on the asymptotic distribution of θ take into account additional un-certainty due to “variable selection” from the (infinite) parametric family ofM explanatory variables X = {(xb1 , ..., xbM ) : bj ∈ Θ2, b1 > b2 > ... > bM}.For the practical implementation of the model, the following algorithms

that include estimation of M are defined in Beran and Mazzola (1999b): ifM is fixed, then the algorithm consists of two basic steps: a) generation ofthe set of all possible explanatory variables xs (s ∈ S), and b) selection ofM variables (bandwidths) that maximize R2. This means that after step 1,the estimation problem is reduced to variable selection in multiple regres-sion, with a fixed number M of explanatory variables. Standard regressionsoftware, such as the function leaps in S-Plus, can be used for this purpose.The detailed algorithm is as follows:

Algorithm 1 Define a sufficiently fine grid S = {s1, ..., sk} ⊂ Θ2 andcarry out the following steps:

Step 1: Define k explanatory time series xs = [xs(t1), ..., xs(tn)]t (s ∈ S)by xs(ti) = g(ti, s).

Step 2: For each b = (b1, ..., bM ) ∈ SM , with bi > bj (i < j) define then × M matrix X = (xb1 , ..., xbM ) and let β = β(b) = (XtX)−1Xty.Also, denote by R2(b) the corresponding value of R2 obtained from leastsquares regression of y on X.

Step 3: Define θ = (β, b)t by b = argmaxbR2(b) and β = β(b).

If M is unknown, then the algorithm can be modified, for instance by in-creasing M as long as all β-coefficients are significant. In order to calculatethe standard deviation of β at each stage, the error process εi needs to bemodeled explicitly. Beran and Mazzola (1999) use fractional autoregressivemodels together with the BIC for choosing the order of the process. Thisleads to

Algorithm 2 Define a sufficiently fine grid S = {s1, ..., sk} ⊂ Θ2 for the

©2004 CRC Press LLC

Page 147: Statistics in Musicology

bandwidths, and calculate k explanatory time series xs (s ∈ S) by xs(ti) =g(ti, s). Furthermore, define a significance level α, set Mo = 0, and carryout the following steps:Step 1: Set M = Mo + 1;Step 2: For each b = (b1, ..., bM ) ∈ SM , with bi > bj (i < j) define then × M matrix X = (xb1 , ..., xbM ) and let β = β(b) = (XtX)−1Xty.Also, denote by R2(b) the corresponding value of R2 obtained from leastsquares regression of y on X.

Step 3: Define θ = (b, β)t by b = argmaxbR2(b) and β = β(b).

Step 4: Let e(θ) = [e1, ..., en]t be the vector of regression residuals. Assumethat ei is a fractional autoregressive process of unknown order p charac-terized by a parameter vector ζ = (σ2

ε , d, φ1, ..., φp). Estimate p and ζ bymaximum likelihood and the BIC.

Step 5: Calculate for each j = 1, ...,M, the estimated standard deviationσj(ζ) of βj , and set

pj = 2[1− Φ(|βj |σ−1j (ζ))]

where Φ denotes the cumulative standard normal distribution function.If max (pj) < α, set Mo = Mo + 1 and repeat 1 through 5. Otherwise,stop the iteration and set M = Mo and θ equal to the correspondingestimate.

5.2.4 Hierarchical wavelet models

Wavelet decomposition has become very popular in statistics and manyfields of application in the last few years. This is due to the flexibility todepict local features at different levels of resolution. There is an extendedliterature on wavelets spanning a vast range between profound mathemat-ical foundations and mathematical statistics to concrete applications suchas data compression, image and sound processing, and data analysis, toname only a few. For references see for example Daubechies (1992), Meyer(1992, 1993), Kaiser (1994), Antoniadis and Oppenheim (1995), Ogden(1996), Mallat (1998), Hardle et al. (1998), Vidakovic (1999), Percival andWalden (2000), Jansen (2001), Jaffard et al. (2001). The essential principleof wavelets is to express square integrable functions in terms of orthogonalbasis functions that are zero except in a small neighborhood, the neigh-borhoods being hierarchical in size. The set of basis functions Ψ = {ϕok,k ∈ Z} ∪ {ψjk, j, k ∈ Z} is generated by two functions only, the fatherwavelet ϕ and the mother wavelet ψ, respectively, by up/downscaling andshifting of the location respectively. If scaling is done by powers of 2 andshifting by integers, then the basis functions are:

ϕok(x) = ϕoo(x− k) = ϕ(x− k) (k ∈ Z) (5.18)

©2004 CRC Press LLC

Page 148: Statistics in Musicology

ψjk(x) = 2j2ψoo(2jx− k) = 2

j2ψ(2jx− k) (j ∈ N, k ∈ Z) (5.19)

With respect to the scalar product < g, h >=∫g(x)h(x)dx, these basis

functions are orthonormal:

< ϕok, ϕom >= 0 (k = m), < ϕok, ϕok >= ||ϕk||2 = 1 (5.20)

< ψjk, ψlm >= 0 (k = m or j = l), < ψjk, ψjk >= ||ψjk||2 = 1 (5.21)< ψjk, ϕol >= 0 (5.22)

Every function g in L2(R) (the space of square integrable functions on R)has a unique representation

g(x) =∞∑

k=−∞akϕok(x) +

∞∑j=0

∞∑k=−∞

bjkψjk(x) (5.23)

=∞∑

k=−∞akϕ(x − k) +

∞∑j=0

∞∑k=−∞

bjkψ(2jx− k) (5.24)

whereak =< g, ϕk >=

∫g(x)ϕk(x)dx (5.25)

andbjk =< g, ψjk >=

∫g(x)ψjk(x)dx (5.26)

Note in particular that∫g2(x)dx =

∑a2k +

∑b2jk. The purpose of this

representation is a decomposition with respect to frequency and time. Asimple wavelet, where the meaning of the decomposition can be understooddirectly, is the Haar wavelet with

ϕ(x) = 1{0 ≤ x < 1} (5.27)

where 1{0 ≤ x < 1} = 1 for 0 ≤ x < 1 and zero otherwise, and

ψ(x) = 1{0 ≤ x <12} − 1{1

2≤ x < 1}. (5.28)

For the Haar basis functions ϕk, we have coefficients

ak =∫ k+1

k

g(x)dx (5.29)

Thus, coefficients of the basis functions ϕk are equal to the average valueof g in the interval [k, k + 1]. For ψjk we have

bjk = 2j2 [∫ 2−j(k+ 1

2 )

2−jk

g(x)dx −∫ 2−j(k+1)

2−j(k+ 12 )

g(x)dx] (5.30)

which is the difference between the average values of g in the intervals2−jk ≤ x < 2−j(k + 1

2 ) and 2−j(k + 12 ) ≤ x < 2−j(k + 1). This can be

interpreted as a (signed) measure of variability. Since each interval Ijk =

©2004 CRC Press LLC

Page 149: Statistics in Musicology

[2−jk, 2−j(k+1)] has length 2−j and midpoint 2−j(k+ 12 ), the coefficients

ajk (or their squares a2jk) characterize the variability of g at different scales

2−j (j = 0, 1, 2, ...) and a grid of locations 2−j(k + 12 ) that becomes finer

as the scale decreases with increasing values of j.Suppose now that a time series (function) yt is observed at a finite num-

ber of discrete time points t = 1, 2, ..., n with n = 2m. To relate this towavelet decomposition in continuous time, one can construct a piecewiseconstant function in continuous time by

gn(x) =n−1∑k=0

yk1{kn≤ x <

k + 1n

} =n−1∑k=0

yk1{2−mk ≤ x < 2−m(k + 1)}(5.31)

Since gn is a step function (like the Haar basis functions themselves) andzero outside the interval [0, 1), the Haar wavelet decomposition of gn hasonly a finite number of nonzero terms:

gn(x) = aoo +m−1∑j=0

2j−1∑k=0

bjkψjk(x) (5.32)

Note that gn assumes only a finite number of values gn(x) = ynx (x =1/n, 2/n, ..., 1). Moreover, for x = k/n, ψjk(x) = 2

j2 ψ(2jx − k) is nonzero

for 0 ≤ k < 1/(2m−j − 1) only. Therefore, Equation (5.32) can be writtenin matrix form and calculation of the coefficients aoo and bjk can be doneby matrix inversion. Since matrix inversion may not be feasible for largedata sets, various efficient algorithms such as the so-called discrete wavelettransform have been developed (see e.g. Percival and Walden 2000).An interesting interpretation of wavelet decomposition can be given in

terms of total variability. The total variability of an observed series can bedecomposed into contributions of the basis functions by

∑(yt − y)2 =

m−1∑j=0

2j−1∑k=0

b2jk. (5.33)

A plot of b2jk against j (or 2j = “frequency”, or 2−j = “period”) and k(“location”) shows for each k and j how much of the signal’s variability isdue to variation at the corresponding location k and frequency 2j.To illustrate how wavelet decomposition works, consider the following

simulated example: let xi = 2 cos(2πi/90) if i ∈ {1, ..., 300} or {501, ..., 700}or {901, ..., 1024}, For 301 ≤ i ≤ 500, set xi = 1

2 cos(2πi/10), and for701 ≤ i ≤ 900, xi = 15 cos(2πi/10) + 1

10000 (i − 200)2. The observed signalthus consists of several periodic segments with different frequencies andamplitudes, the largest amplitude occurring between t = 701 and 900, to-gether with a slight trend. Figure 5.1a displays xi. The coefficients for thefour highest levels (i.e. j = 0, 1, 2, 3) are plotted against time in Figure

©2004 CRC Press LLC

Page 150: Statistics in Musicology

5.1b. Note that D stands for mother and S for father wavelet. Moreover,the numbering in the plot (as given in S-Plus) is opposite to the one givenabove: s4 and d4 in the plot correspond to the coarsest level j = 0 above.The corresponding functions at the different levels are given in Figure 5.1c.The ten and fifty largest basis contributions are given in Figures 5.1d ande respectively (together with the data on top and residuals at the bot-tom). Figure 5.1f shows the time frequency plot of the squared coefficientsin the wavelet decomposition of xi. Bright shading corresponds to largecoefficients. All plots emphasize the high-frequency portion with large am-plitude between i = 701 and 900. Moreover, the trend at this location isvisible through the coefficient values of the father wavelet ψ (s4 in theplot) and the slightly brighter shading in the lowest frequency band of thetime-frequency plot.An alternative to HISMOOTH models can be defined via wavelets (the

following definition is a slight modification of Beran and Mazzola 2001):Defini ti on 41 Let φ, ψ ∈ L2(R) be a father and the corresponding motherwavelet respectively, φk(.) = φ(. − k), ψj,k = 2j/2ψ(2j . − k) (k ∈ Z, j ∈ N)the orthogonal wavelet basis generated by φ and ψ, and ui and εi (i ∈Z) independent stationary zero mean processes satisfying suitable momentconditions. Assume X(ti) = g(ti) + ui with g ∈ L2[0, T ], ti ∈ [0, T ] andwavelet decomposition g(t) =

∑akφk(t) +

∑bj,kψj,k(t). For 0 = cM+1 <

cM < ... < c1 < co = ∞ let

g(t; ci−1, ci) =∑

ci≤|ak|<ci− 1

akφk(t) +∑

ci≤|bj,k|<ci− 1

bj,kψj,k(t).

Then (X(ti), Y (ti)) (i = 1, ..., n) is a Hierarchical Wavelet Model (HI-WAVE model) of order M , if there exists M ∈ N, β = (β1, ..., βM ) ∈ RM ,η = (η1, ..., ηM ) ∈ RM+ , 0 < ηM < ...η1 < ηo = ∞ such that

Y (ti) =M∑l=1

βlg(ti; ηl−1, ηl) + εi. (5.34)

The definition means that the time series Y (t) is decomposed into orthog-onal components that are proportional to certain “bands” in the waveletdecomposition of the explanatory series X(t) – the bands being defined bythe size of wavelet coefficients. As for HISMOOTH models, the parametervector θ = (β, η)t can be estimated by nonlinear least squares regression.To illustrate how HIWAVE-models may be used, consider the followingsimulated example: let xi = g(ti) (i = 1, ..., 1024) as in the previous ex-ample. The function g is decomposed into g(t) = g(t;∞, η1) + g(t; η1, 0) =g1(t) + g2(t) where η1 is such that 50 wavelet coefficients of g are largeror equal η1. Figure 5.2 shows g, g1, and g2. A simulated series of responsevariables, defined by Y (ti) = 2g1(ti)+ εi (t = 1, ..., 1024) with independentzero-mean normal errors εi with variance σ2

ε = 100, is shown in Figure 5.3b.

©2004 CRC Press LLC

Page 151: Statistics in Musicology

x

0 200 400 600 800 1000

-10

01

020

x

a

0 200 400 600 800 1000

s4

d4

d3

d2

d1

idwt

Coefficients upto j=4(numbered in reversed order)

b

Figure 5.1 Simulated signal (a) and wavelet coefficients (b).

©2004 CRC Press LLC

Page 152: Statistics in Musicology

S4

D4

D3

D2

D1

Data

0 200 400 600 800 1000

Components upto j=4

c

Resid

D3.104

D3.109

D3.112

D3.102

D3.92

0 200 400 600 800 1000

The largest ten components

d

Figure 5.1 c and d: wavelet components of simulated signal in a.

©2004 CRC Press LLC

Page 153: Statistics in Musicology

Figure 5.1 e and f: wavelet components of simulated signal in a and frequencyplot of coefficients.

©2004 CRC Press LLC

Page 154: Statistics in Musicology

x

0 200 400 600 800 1000

-40

-3

0

-20

-1

0

0

10

20

x and its components g1 (left) and g2=x-g1 (right)

g1

0 400 800

-10

0

1

0

20

x-g

1

0 400 800

-10

0

5

10

g1=first 50 components of x g2=x-g1

Figure 5.2 Decomposition of x−series in simulated HIWAVE model.

A comparison of the two scatter plots in Figures 5.3c and d shows a muchclearer dependence between y and g1 as compared to y versus x = g. Figure5.3e illustrates that there is no relationship between y and g2. Finally, thetime-frequency plot in Figure 5.3f indicates that the main periodic behavioroccurs for t ∈ {701, ..., 900}. The difficulty in practice is that the correctdecomposition of x into g1 and the redundant component g2 is not knowna priori. Figure 5.4 shows y and the HIWAVE-curve βo+ β1g(ti;∞, η1) (forgraphical reason the fitted curve is shifted vertically) fitted by nonlinearleast squares regression. Apparently, the algorithm identified η1 and hencethe relevant time span [701, 900] quite exactly, since g(ti;∞, η1) correspondsto the sum of the largest 51 wavelet components. The estimated coefficientsare βo = −0.36 and β1 = 1.95. If we assume (incorrectly of course) thatη1 has been known a priori, then we can give confidence intervals for bothparameters as in linear least squares regression. These intervals are gen-erally too short, since they do not take into account that η1 is estimated.However, if a null hypothesis is not rejected using these intervals, then itwill not be rejected by the correct test either. In our case, the linear re-gression confidence intervals for βo and β1 are [−0.96, 0.24] and [1.81, 2.09]respectively, and thus contain the true values βo = 0 and β1 = 2.

©2004 CRC Press LLC

Page 155: Statistics in Musicology

Figure 5.3 Simulated HIWAVE model - explanatory series g 1 (a), y−series (b),y versus x (c), y versus g 1 (d), y versus g 2 = x − g 1 (e) and time frequency plotof y (f).

©2004 CRC Press LLC

Page 156: Statistics in Musicology

COLOR FIGURE 2.30 The minnesingerBurchard von Wengen (1229-1280), contem-porary of Adam de la Halle (1235?-1288).(From Codex Manesse, courtesy of the Uni-versity Library, Heidelberg.)

COLOR FIGURE 2.35 Symbol plot with x = pj5, y = pj7, and radius of circlesproportional to pj6.

©2004 CRC Press LLC

Page 157: Statistics in Musicology

©2004 CRC Press LLC

COLOR FIGURE 2.36 Symbol plot with x = pj5, y = pj7. The rectangles havewidth pj1 (diminished second) and height pj6 (augmented fourth).

COLOR FIGURE 2.37 Symbol plot with x = pj5, y = pj7, and triangles definedby pj1 (diminished second), pj6 (augmented fourth), and pj10 (diminishedseventh).

Page 158: Statistics in Musicology

©2004 CRC Press LLC

COLOR FIGURE 3.2

Fractal pictures (by Céline Beran, computer generated).

COLOR FIGURE 2.38

Names plotted at locations (

x, y

) = (

p

j5

,

p

j7

).

Page 159: Statistics in Musicology

©2004 CRC Press LLC

COLOR FIGURE 4.7

A harpsichord sound and its spectrogram. Intense pinkcorresponds to high values of

I

(

t,

λ

).

COLOR FIGURE 9.6

Gradualewritten for an Augustinian monasteryof the diocese Konstanz, 13th cen-tury. (Courtesy of ZentralbibliothekZürich.)

Page 160: Statistics in Musicology

Figure 5.4 HIWAVE time series and fitted function g 1 .

5.3 Sp ecific applications in music

5.3.1 Hierarchical decomposition of metric, melodic, and harmonicweights

Decomposition of metric, melodic and harmonic weights as in (5.3) and(5.4) can reveal structures and relationships that are not obvious in theoriginal series. To illustrate this, Figures 5.5a through d and 5.5e throughh show a decomposition of these weights for Bach’s Canon cancricans from“Das Musikalische Opfer” BWV 1079 and Webern’s Variation op. 27/2respectively. The bandwidths were chosen based on time signature andbar grouping. Webern’s piano piece is written in 2/4 signature, its formalgrouping is 1 + 11 + 11 + 11 + 11; however, Webern insists on a groupingin 2-bar portions suggesting the bandwidths of 5.5 (11 bars), 1 (2 bars)and 0.5 (1 bar). Bach’s canon is written in 4/4 signature; the grouping is9+9+9+9.The chosen bandwidths are 9 (9 bars), 3 (3 bars) and 1 (1 bar).For both compositions, much stronger similarities between the smoothedmetric, melodic, and harmonic components can be observed than for theoriginal weights. An extended discussion of these and other examples canbe found in Beran and Mazzola (1999a).

©2004 CRC Press LLC

Page 161: Statistics in Musicology

Figure 5.5 Hierarchical decomposition of metric, melodic, and harmonic indica-tors for Bach’s “Canon cancricans” (Das Musikalische Opfer BWV 1079) andWebern’s Variation op. 27, No. 2.

5.3.2 HIREG models of the relationship between tempo and melodiccurves

Quantitative analysis of performance data is an attempt to understand“objectively” how musicians interpret a score (Figure 5.6). For the anal-ysis of tempo curves for Schumann’s Traumerei (Figure 2.3), Beran andMazzola (1999a) construct the following matrix of explanatory variablesby decomposing structural weight functions into components of different

©2004 CRC Press LLC

Page 162: Statistics in Musicology

smoothness: let x1 = xmetric = metric weight, x2 = xmelod = melodicweight, x3 = xhmean = harmonic (mean) weight (see Chapter 3). Definethe bandwidths b1 = 4 (4 bars), b2 = 2 (2 bars) and b3 = 1 (1 bar) anddenote the corresponding components in the decomposition of x1, x2, x3

by xj,metric = xj,1, xj,melod = xj,2, xj,hmean = xj,3. More exactly, sinceharmonic weights are originally defined for each note, two alternative vari-ables are considered for the harmonic aspect: xhmean(tl) = average har-monic weight at onset time tl, and xhmax(tl) = maximal harmonic weightat onset time tl. Thus, the decomposition of four different weight functionsxmetric, xmelod, xhmean, and xhmax is used in the analysis. Moreover, foreach curve, discrete derivatives are defined by

dx(tj) =x(tj)− x(tj−1)

tj − tj−1

and

dx(2)(tj−1) =dx(tj)− dx(tj−1)

tj − tj−1.

Each of these variables is decomposed hierarchically into four components,as decribed above, with the bandwidths b1 = 4 (weighted averaging over 8bars), b2 = 2 (4 bars), b3 = 1 (2 bars) and b4 = 0 (residual – no averaging).We thus obtain 48 variables (functions):

xmetric,1 xmetric,2 xmetric,3 xmetric,4dxmetric,1 dxmetric,2 dxmetric,3 dxmetric,4d2xmetric,1 d2xmetric,2 d2xmetric,3 d2xmetric,4

xmelodic,1 xmelodic,2 xmelodic,3 xmelodic,4dxmelodic,1 dxmelodic,2 dxmelodic,3 dxmelodic,4d2xmelodic,1 d2xmelodic,2 d2xmelodic,3 d2xmelodic,4

xhmax,1 xhmax,2 xhmax,3 xhmax,4dxhmax,1 dxhmax,2 dxhmax,3 dxhmax,4d2xhmax,1 d2xhmax,2 d2xhmax,3 d2xhmax,4

xhmean,1 xhmean,2 xhmean,3 xhmean,4dxhmean,1 dxhmean,2 dxhmean,3 dxhmean,4d2xhmean,1 d2xhmean,2 d2xhmean,3 d2xhmean,4

In addition to these variables, the following score-information is modeledin a simple way:1. Ritardandi There are four onset intervals R1, R2, R4, and R4 with an

explicitly written ritardando instruction, starting at onset times to(Rj)(j = 1, 2, 3, 4) respectively. This is modeled by linear functions

xritj (t) = 1{t ∈ Rj} · (t− to(Rj)), j = 1, 2, 3, 4 (5.35)

©2004 CRC Press LLC

Page 163: Statistics in Musicology

Figure 5.6 Quantitative analysis of performance data is an attempt to under-stand “objectively” how musicians interpret a score without attaching any subjec-tive judgement. (Left: “Freddy” by J.B.; right: J.S. Bach, woodcutting by ErnstWurtemberger, Zurich. Courtesy of Zentralbibliothek Zurich).

2. Suspensions There are four onset intervals S1, S2, S4, and S4 withsuspensions, starting at onset times to(Sj) (j = 1, 2, 3, 4) respectively.The effect is modeled by the variables

xsusj(t) = 1{t ∈ Sj} · (t− to(Sj)), j = 1, 2, 3, 4 (5.36)

3. Fermatas There are two onset intervals F1, F2 with fermatas. Theireffect is modeled by indicator functions

xfermj(t) = 1{t ∈ Fj}, j = 1, 2 (5.37)

The variables are summarized in an n× 57 matrix X . After orthonormal-ization, the following model is assumed:

y(j) = Zβ(j) + ε(j)

where y(j) = [y(t1, j), y(t2, j), ..., y(tn, j)]t are the tempo measurementsfor performance j, Z is the orthonormalized X-matrix, β(j) is the vectorof coefficients (β1(j), ..., βp(j))t and ε(j) = [ε(t1, j), ε(t2, j), ..., ε(tn, j)]t isa vector of n identically distributed, but possibly correlated, zero meanrandom variables ε(ti, j) (ti ∈ T ) with variance var(ε(ti, j)) = σ2(j). Be-ran and Mazzola (1999a) select the most important variables for each ofthe 28 performances separately, by stepwise linear regression. The mainaim of the analysis is to study the relationship between structural weightfunctions and tempo with respect to a) existence, b) type and complexity,and c) comparison of different performances. It should perhaps be empha-sized at this point that quantitative analysis of performance data aims atgaining a better “objective” understanding how pianists interpret a score

©2004 CRC Press LLC

Page 164: Statistics in Musicology

without attaching any subjective judgement. The aim is thus not to findthe “ideal performance” which may in fact not exist or to state an opin-ion about the quality of a performance. The values of R2, obtained forthe full model with all explanatory variables, vary between 0.65 and 0.85.Note, however, that the number of potential explanatory variables is verylarge so that high values of R2 do not necessarily imply that the regressionmodel is meaningful. On the other hand, musical performance is a verycomplex process. It is therefore not unreasonable that a large number ofexplanatory variables may be necessary. This is confirmed formally, in thatfor most performances, the selected models turn out to be complex (withmany variables), all variables being statistically significant (at the 5%-level)even when correlations in the errors are taken into account. For instance,for Brendel’s performance (R2 = 0.76), seventeen significant variables areselected (including first and second derivatives). In spite of the complex-ity, there is a large degree of similiarity between the performances in thefollowing sense: a) all except at most 3 of the 57 coefficients βj have thesame sign for all performances (the results are therefore hardly random), b)there are “canonical” variables that are chosen by stepwise regression for(almost) all performances, and c) the same is true if one considers (for eachperformance separately) explanatory variables with the largest coefficient.Figure 5.7 shows three of these curves. The upper curve is the most impor-tant explanatory variable for 24 of the 28 performances. The exceptions are:all three Cortot-performances and Krust with a preference for the middlecurve which reflects the division of the piece into 8 parts and the perfor-mance by Ashkenazy with a curve similar to Cortot’s. Apparently, Cortot,Krust, and Ashkenazy put special emphasis on the division into 8 parts.The results can also be used to visualize the structure of tempo curves inthe following way: using the size of |βk| as criterion for the importance ofvariable k, we may add the terms in the regression equation sequentially toobtain a hierarchy of tempo curves ranging from very simple to complex.This is illustrated in Figures 5.8a and b for Ashkenazy and Horowitz’s thirdperformance.

5.3.3 HISMOOTH models for the relationship between tempo andstructural curves

An analysis of the relationship between a melodic curve (Chapter 3) and the28 tempo curves for Schumann’s Traumerei is discussed in Beran and Maz-zola (1999). In a first step, effects of fermatas and ritardandi are subtractedfrom each of the 28 tempo series individually, using linear regression. Thecomponent of the melodic curve mt orthogonal to these variables is thenused. The second algorithm for HISMOOTH models is used, with a grid Gthat takes into account that 0 ≤ t ≤ 32 and only certain multiples of 1/8correspond to musically interesting neighborhoods: G = {32, 30, 28, 26, 24,

©2004 CRC Press LLC

Page 165: Statistics in Musicology

Figure 5.7 Most important melodic curves obtained from HIREG fit to tempocurves for Schumann’s Traumerei.

onset time

est

imate

d a

nd o

bse

rved lo

g(t

em

po)

5 10 15 20 25 30

05

10

15

20

Figure 5.8a: Adding effects for ASKENAZE

onset time

est

imate

d a

nd o

bse

rved lo

g(t

em

po)

5 10 15 20 25 30

-50

510

15

20

25

Figure 5.8b: Adding effects for HOROWITZ3

Figure 5.8 Successive aggregation of HIREG-components for tempo curves byAshkenazy and Horowitz (third performance).

©2004 CRC Press LLC

Page 166: Statistics in Musicology

22, 20, 18, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1.5, 1, 0.75, 0.5, 0.25, 0.125}.Note that since for large bandwidths the resulting curves g do not varymuch, large trial bandwidths do not need to be too close together. Theerror process ε is modeled by a fractional AR(p, d) process, the order beingestimated from the data by the BIC. Note that, from the musicologicalpoint of view, the fractional differencing parameter can be interpreted as ameasure of self-similarity (see Chapter 3).For illustration, consider the performances CORTOT1 and HOROWITZ1

(see Figures 5.9b and c). In both cases, the number M of explanatory vari-ables estimated by Algorithm 2 turns out to be 3 (with a level of significanceof α = 0.05). The estimated bandwidths (and 95%-confidence intervals)are b1 = 4.0 ([2.66, 5.34]), b2 = 2.0 ([1.10, 2.90]) and b3 = 0.5 ([0.17, 0.83])for CORTOT1 and b1 = 4 ([2.26, 5.74]), b2 = 1 ([0.39, 1.62]) and b3 =.25 ([0.04, 0.46]) for HOROWITZ1. The estimates of β are β1 = −0.81([−1.53,−0.10]), β2 = 1.08 ([0.21, 1.05]) and β3 = −0.624 ([−1.15,−0.10]),and β1 = −0.42 ([−0.66,−0.18]), β2 = 0.54 ([0.13, 0.95]) and β3 = −0.68([−1.08,−0.28]) respectively. Finally, the fitted error process for Cortot isa fractional AR(1) process with d = −0.25 ([−0.60, 0.09]) and φ1 = 0.77[0.48, 1]. For Horowitz we obtain a fractional AR(2) process with d = 0.30([0.14, 0.45], φ1 = 0.26 ([0.09, 0.42]) and φ2 = −0.43 ([−0.55,−0.30]).A possible interpretation of the results is as follows: the largest band-

width b1 = 4 (one bar) is the same for both performers. A relativelylarge portion of the shaping of the tempo “happens” at this level. Apartfrom this, however, Horowitz’s bandwidths are smaller. Horowitz appearsto emphasize very local melodic structures more than Cortot. Moreover,for Horowitz, d > 0 (long-range dependence): while the small scale struc-tures are “explained” by the melodic structure of the score, the remaining“unexplained” part of the performance is still “coherent” in the sense thatthere is a relatively strong (self-)similarity and positive correlations even be-tween remote parts. On the other hand, for Cortot, d < 0 (antipersistence):While larger scale structures are “explained” by the melodic structure ofthe score, more local fluctuations are still “coherent” in the sense that thereis a relatively strong negative autocorrelation even between remote parts,these smaller scale structures are however difficult to relate directly to themelodic structure of the score.Figures 5.9a through d also simplified tempo curves for all 28 perfor-

mances, obtained by HISMOOTH fits with M = 3. The comparison oftypical characteristics is now much easier than for the original curves. Inparticular, there is a strong similarity between all three performances byHorowitz on one hand, and the three performances by Cortot on the otherhand. Several performers (Moisewitsch, Novaes, Ortiz, Krust, Schnabel,Katsaris) put even higher emphasis on global melodic features than Cor-tot. Striking similarities can also be seen between Horowitz, Klien, and

©2004 CRC Press LLC

Page 167: Statistics in Musicology

Figure 5.9 a and b: HISMOOTH-fits to tempo curves (performances 1-14).

Brendel. Another group of similar performances consisting of Cortot, Arg-erich, Capova, Demus, Kubalek, and Shelley.

5.3.4 Digital encoding of musical sounds (CD, mpeg)

Wavelet decomposition plays an important role in modern techniques ofdigital sound and image processing. Digital encoding of sounds (e.g. CD,mpeg) relies on algorithms that make it possible to compress complex datain as few storage units as possible. Wavelet decoposition is one such tech-nique: instead of storing a complete function (evaluated or measured at avery large number of time points on a fine grid), one only needs to keep therelatively small number of wavelet coefficients. There is an extensive liter-ature on how exactly this can be done to suit particular engineering needs.Since here the focus is on “genuine” musical questions rather than signalprocessing, we do not pursue this further. The interested reader is referredto the engineering literature such as Effelsberg and Steinmetz (1998) andreferences therein.

5.3.5 Wavelet analysis of tempo curves

Consider the tempo curves for Schumann’s Traumerei. Wavelet analysiscan help one to understand some of the similarities and differences be-

©2004 CRC Press LLC

Page 168: Statistics in Musicology

Figure 5.9 c and d: HISMOOTH-fits to tempo curves (performances 15-28).

tween tempo curves. This is illustrated in Figures 5.10a through f wheretime-frequency plots of the three tempo curves by Cortot are comparedwith those by Horowitz. (More specifically, only the first 128 observationsare used here.) The obvious difference is that Horowitz has more power inthe high frequency range. Figures 5.11a through f compare the wavelet coef-ficients of residuals obtained after subtracting a kernel-smoothed version ofthe tempo curves (bandwidth 1/8, i.e. averaging was done over one quarterof a bar). This provides an overview of local details of the curves. In partic-ular, it can be seen at which level of resolution each pianist kept essentiallythe same “profile” throughout the years. For instance, for Horowitz thecomplete profile at level 2 (d2) remains essentially the same. An even betteradaptation to data is achieved by using so-called wavelet packets, which aregeneralizations of wavelets, in conjunction with a “best-basis algorithm”.The idea of the algorithm is to find the best type of basis functions suit-able to approximate an observed time series with as few basis functions aspossible. This is a way out of the limitation due the very specific shape ofa particular class of wavelet functions (see e.g. Haar wavelets where we areconfined to step functions). For detailed references on wavelet packets seee.g. Coifman et al. (1992) and Coifman and Wickerhauser (1992). Figures5.12 through 5.14 illustrate the usefulness of this approach: the 28 tempocurves of Schumann’s Traumerei are approximated by the most important

©2004 CRC Press LLC

Page 169: Statistics in Musicology

Figure 5.10 Time frequency plots for Cortot’s and Horowitz’s three performances.

two (Figure 5.12), five (Figure 5.13) and ten (Figure 5.14) best basis func-tions. The plots show interesting and plausible similarities and differences.Particularly striking are Cortot’s 4-bar oscillations, Horowitz’s “seismic”local fluctuations, the relatively unbalanced tempo with a few extremetempo variations for Eschenbach, Klien, Ortiz, and Schnabel, the irregularshapes for Moisewitsch, and also a strong similarity between Horowitz1 andMoisewitsch with respect to the general shape (Figure 5.12).

5.3.6 HIWAVE models of the relationship between tempo and melodiccurves

HIWAVE models can be used, for instance, to establish a relationship be-tween structural curves obtained from a score and a performance of thescore. Here, we consider the tempo curves by Cortot and Horowitz (Figure5.15a), and the melodic weight function m(t) defined in Section 3.3.4. As-suming a HIWAVE-model of order 1, Figure 5.15b displays the value of R2

©2004 CRC Press LLC

Page 170: Statistics in Musicology

0 20 40 60 80 100

s2

d2

d1

idwt

Coefficients of residuals - Cortot1

a

0 20 40 60 80 100

s2

d2

d1

idwt

Coefficients of residuals - Cortot2

b

0 20 40 60 80 100

s2

d2

d1

idwt

Coefficients of residuals - Cortot3

c

0 20 40 60 80 100

s2

d2

d1

idwt

Coefficients of residuals - Horowitz1

d

0 20 40 60 80 100

s2

d2

d1

idwt

Coefficients of residuals - Horowitz2

e

0 20 40 60 80 100

s2

d2

d1

idwt

Coefficients of residuals - Horowitz3

f

Figure 5.11 Wavelet coefficients for Cortot’s and Horowitz’s three performances.

©2004 CRC Press LLC

Page 171: Statistics in Musicology

0 50 100 150

-0.5

0.0

0.5

1.0

ARGERICH

0 50 100 150-2

-10

1

ARRAU

0 50 100 150

-3-2

-10

ASKENAZE

0 50 100 150

-3-2

-10

1

BRENDEL

0 50 100 150

-2-1

01

BUNIN

0 50 100 150

-0.5

0.0

0.5

1.0

CAPOVA

0 50 100 150

-1.0

0.0

0.5

1.0

CORTOT1

0 50 100 150

-1.0

0.0

0.5

1.0

CORTOT2

0 50 100 150

-1.0

0.0

0.5

1.0

CORTOT3

0 50 100 150

-1.0

0.0

0.5

1.0

CURZON

0 50 100 150

-1.5

-0.5

0.5

1.0

DAVIES

0 50 100 150

-0.5

0.0

0.5

1.0

DEMUS

0 50 100 150

-4-3

-2-1

01

2

ESCHENBACH

0 50 100 150

-3-2

-10

1

GIANOLI

0 50 100 150

-0.5

0.0

0.5

1.0

HOROWITZ1

0 50 100 150

-3-2

-10

HOROWITZ2

0 50 100 150

-3-2

-10

12

HOROWITZ3

0 50 100 150

-3-2

-10

1

KATSARIS

0 50 100 150

-4-3

-2-1

0

KLIEN

0 50 100 150

-1.5

-0.5

0.5

1.0

KRUST

0 50 100 150

-1.0

0.0

0.5

1.0

KUBALEK

0 50 100 150

-0.5

0.0

0.5

1.0

MOISEIWITSCH

0 50 100 150

-2-1

01

NEY

0 50 100 150

-2-1

01

NOVAES

0 50 100 150

-3-2

-10

ORTIZ

0 50 100 150

-4-3

-2-1

0

SCHNABEL

0 50 100 150

-1.0

-0.5

0.0

0.5

1.0

SHELLEY

0 50 100 150

-3-2

-10

ZAK

Figure 5.12 Tempo curves – approximation by most important 2 best basis func-tions.

0 50 100 150

-3-2

-10

1

ARGERICH

0 50 100 150

-3-2

-10

1

ARRAU

0 50 100 150

-3-2

-10

1

ASKENAZE

0 50 100 150

-3-2

-10

1

BRENDEL

0 50 100 150

-2-1

01

BUNIN

0 50 100 150

-1.0

0.0

0.5

1.0

CAPOVA

0 50 100 150

-3-2

-10

1

CORTOT1

0 50 100 150

-3-2

-10

1

CORTOT2

0 50 100 150

-3-2

-10

1

CORTOT3

0 50 100 150

-2-1

01

CURZON

0 50 100 150

-1.5

-0.5

0.5

1.0

DAVIES

0 50 100 150

-2-1

01

DEMUS

0 50 100 150

-4-3

-2-1

01

ESCHENBACH

0 50 100 150

-4-3

-2-1

01

GIANOLI

0 50 100 150

-1.0

0.0

1.0

HOROWITZ1

0 50 100 150

-3-2

-10

HOROWITZ2

0 50 100 150

-3-2

-10

12

HOROWITZ3

0 50 100 150

-3-2

-10

1

KATSARIS

0 50 100 150

-4-3

-2-1

01

2

KLIEN

0 50 100 150

-1.5

-0.5

0.5

KRUST

0 50 100 150

-1.0

0.0

1.0

KUBALEK

0 50 100 150

-0.5

0.0

0.5

1.0

1.5

MOISEIWITSCH

0 50 100 150

-2-1

01

NEY

0 50 100 150

-2-1

01

NOVAES

0 50 100 150

-3-2

-10

1

ORTIZ

0 50 100 150

-5-4

-3-2

-10

SCHNABEL

0 50 100 150

-2-1

01

SHELLEY

0 50 100 150

-3-2

-10

1

ZAK

Figure 5.13 Tempo curves – approximation by most important 5 best basis func-tions.

©2004 CRC Press LLC

Page 172: Statistics in Musicology

0 50 100 150

-3

-2

-1

0 1

ARGERICH

0 50 100 150-3

-2

-1

0

1

ARRAU

0 50 100 150

-3

-2

-1

0 1

ASKENAZE

0 50 100 150

-3

-2

-1

0 1

BRENDEL

0 50 100 150

-3

-2

-1

0 1

BUNIN

0 50 100 150

-1

0 1

2

CAPOVA

0 50 100 150

-3

-2

-1

0 1

CORTOT1

0 50 100 150

-3

-2

-1

0 1

CORTOT2

0 50 100 150

-3

-2

-1

0 1

CORTOT3

0 50 100 150

-1

0 1

CURZON

0 50 100 150

-1.5

-0

.5

0.5

DAVIES

0 50 100 150

-2

-1

0 1

DEMUS

0 50 100 150

-4

-3

-2

-1

0 1

ESCHENBACH

0 50 100 150

-4

-3

-2

-1

0 1

GIANOLI

0 50 100 150

-1.5

-0

.5

0.5

1.5

HOROWITZ1

0 50 100 150

-3

-2

-1

0 1

HOROWITZ2

0 50 100 150

-3

-2

-1

0 1

2

HOROWITZ3

0 50 100 150

-4

-3

-2

-1

0 1

KATSARIS

0 50 100 150

-4

-3

-2

-1

0 1

2

KLIEN

0 50 100 150

-3

-2

-1

0 1

KRUST

0 50 100 150

-2

-1

0 1

KUBALEK

0 50 100 150

-2

-1

0 1

MOISEIWITSCH

0 50 100 150

-2

-1

0 1

NEY

0 50 100 150

-2

-1

0 1

NOVAES

0 50 100 150

-3

-2

-1

0 1

ORTIZ

0 50 100 150

-5

-4

-3

-2

-1

0 1

SCHNABEL

0 50 100 150

-3

-2

-1

0 1

2

SHELLEY

0 50 100 150

-3

-2

-1

0 1

ZAK

Figure 5.14 Tempo curves – approximation by most important 10 best basis func-tions.

for the simple linear regression model yi = βo+β1g(ti;∞, η) as a function ofthe number of wavelet-coefficients of mi that are larger or equal to η. Twoobservations can be made: a) for almost all choices of η, the fit for Horowitz(gray lines) is better and b) the best value of η is practically the same for allsix performances. Figure 5.15c shows the fitted HIWAVE-curves for Cortotand Horowitz separately. The result shows an amazing agreement betweenthe three Cortot performances on one hand and the three Horowitz curveson the other hand. The HIWAVE-fits seem to have extracted a major as-pect of the performance styles. Horowitz appears to build blocks of almosthorizontal tempo levels and “adds”, within these blocks, very fine tempovariations. In contrast, for Cortot, blocks have a more “parabolic” shape.It should be noted, of course, that, since Haar wavelets were used here,these features (in particular Horowitz’ horizontal blocks) may be some-what overemphasized. Analogous pictures are displayed in Figures 5.16athrough c and 5.17a through c for the first and second difference of thetempo respectively. Particularly interesting are Figures 5.17b and c: thevalues of R2 are practically the same for all Horowitz performances andclearly lower than for Cortot. Moreover, as before, both pianists show anamazing consistency in their performances.

©2004 CRC Press LLC

Page 173: Statistics in Musicology

Figure 5.15 Tempo curves (a) by Cortot (three curves on top) and Horowitz, R 2

obtained in HIWAVE-fit plotted against trial cut-off parameter η (b) and fittedHIWAVE-curves (c).

©2004 CRC Press LLC

Page 174: Statistics in Musicology

Figure 5.16 First derivative of tempo curves (a) by Cortot (three curves on top)and Horowitz, R2 obtained in HIWAVE-fit plotted against trial cut-off parameterη (b) and fitted HIWAVE-curves (c).

©2004 CRC Press LLC

Page 175: Statistics in Musicology

Figure 5.17 Second derivative of tempo curves (a) by Cortot (three curves on top)and Horowitz, R2 obtained in HIWAVE-fit plotted against trial cut-off parameterη (b) and fitted HIWAVE-curves (c).

©2004 CRC Press LLC

Page 176: Statistics in Musicology

CHAPTER 6

Markov chains and hidden Markovmodels

6.1 Musical motivation

Musical events can often be classified into a finite or countable number ofcategories that occur in a temporal sequence. A natural question is thenwhether the transition between different categories can be characterizedby probabilities. In particular, a successful model may be able to repro-duce formally a listener’s expectation of “what happens next”, by givingappropriate conditional probabilities. Markov chains are simple models indiscrete time that are defined by conditioning on the immediate past only.The theory of Markov chains is well developed and many beautiful resultsare available. More complicated, but very flexible, are hidden Markov pro-cesses. For these models, the probability distribution itself changes dynam-ically according to a Markov process. Many of the developments on hiddenMarkov models have been stimulated by problems in speech recognition.It is therefore not surprising that these models are also very useful for an-alyzing musical signals. Here, a very brief introduction to Markov chainsand hidden Markov models is given. For an extended discussion see, forinstance, Chung (1967), Isaacson and Madsen (1976), Kemey et al. (1976),Billingsley (1986), Elliott et al. (1995), MacDonald and Zucchini (1997),Norris (1998), Bremaud (1999).

6.2 Basic principles

6.2.1 Definition of Markov chains

Let Xo, X1, ... be a sequence of random variables with possible outcomesXt = xt ∈ S. Then the sequence is called a Markov chain, if

M1. The state space S is finite or countable;

M2. For any t ∈ N,

P (Xt+1 = j|Xo = io, X1 = i1, ..., Xt = it) = P (Xt+1 = j|Xt = it)(6.1)

Condition M2 means that the future development of the process, giventhe past, depends on the most recent value only. In the following we also

©2004 CRC Press LLC

Page 177: Statistics in Musicology

assume that the Markov chain is homogeneous in the sense that for anyi, j ∈ N, the conditional probability P (Xt+1 = j|Xt = i) does not dependon time t. The probability distribution of the process Xt (t = 0, 1, 2, ...) isthen fully specified by the initial distribution

πi = P (Xo = i) (6.2)

and the (finite or infinite dimensional) matrix of transition probabilities

pij = P (Xt+1 = j|Xt = i) (i, j = 1, 2, ..., |S|) (6.3)

where |S| = m ≤ ∞ is the number of elements in the state space S.Withoutloss of generality, we may assume S = {1, 2, ...,m}. Note that the vectorπ = (π1, ..., πm)t and the matrix

M = (pij)i,j=1,2,...,m

have the following properties:

0 ≤ πi, pij ≤ 1,m∑i=1

πi = 1

andm∑j=1

pij = 1

Probabilities of events can be obtained by matrix multiplication, since

p(n)ij = P (Xt+n = j|Xt = i) =

m∑j1,...,jn−1=1

pij1pj1j2 ···pjn−1j = [Mn]ij (6.4)

andp(n)j = P (Xt+n = j) = [πtMn]j (6.5)

6.2.2 Transience, persistence, irreducibility, periodicity, and stationarity

The dynamic behavior of a Markov chain can essentially be character-ized by the notions of transience–persistence, irreducibility–reducibility,aperiodicity–periodicity and stationarity–nonstationarity. These propertieswill be discussed now.Consider the probability that the first visit in state j occurs at time n,

given that the process started in state i,

f(n)ij = P (X1 = j, ..., Xn−1 = j,Xn = j|Xo = i) (6.6)

Note that f (n)ij can also be written as

f(n)ij = P (Tj = n|Xo = i)

©2004 CRC Press LLC

Page 178: Statistics in Musicology

whereTj = min

n≥1{n : Xn = j}

is the first time when the process reaches state j. The conditional proba-bility that the process ever visits the state j can be written as

fij = P (Tj < ∞|Xo = i) = P (∪∞n=1{Xn = j}|Xo = i) =

∞∑n=1

f(n)ij (6.7)

We then have the followingDefinition 42 A state i is calledi) transient, if fii < 1.ii) persistent, if fii = 1;Persistence means that we return to the same state again with certainty. Fortransient states it can occur, with positive probability, that we never returnto the same place. As it turns out, a positive probability of never returningimplies that there is indeed a “point of no return”, i.e. a time point afterwhich one never returns. This can be seen as follows. Conditionally onXo = i, the probability that state j is reached at least k + 1 times isequal to fijf

kjj . Hence, for k → ∞, we obtain the probability of returning

infinitely often

qij = P (Xn = j infinitely often|Xo = i) = fij limk→∞

fkjj . (6.8)

This impliesqij = 0 for fjj < 1

andqij = 1 for fjj = 1.

A simple way of checking whether a state is persistent or not is given byTheorem 13 The following holds for a Markov chain:

i) A state j is transient ⇔ qjj = 0 ⇔∑∞n=1 p

(n)jj < ∞

ii) A state j is persistent ⇔ qjj = 1 ⇔ ∑∞n=1 p

(n)jj = ∞.

The condition on∑∞

n=1 p(n)ii can be simplified further for irreducible Markov

chains:Definition 43 A Markov chain is called irreducible, if for each i, j ∈ S,p(n)ij > 0 for some n.

Irreducibility means that wherever we start, any state j can be reached indue time with positive probability. This excludes the possibility of beingcaught forever in a certain subset of S. With respect to persistent and tran-sient states, the situation simplifies greatly for irreducible Markov chains:Theorem 14 Suppose that Xt (t = 0, 1, ...) is an irreducible Markov chain.Then one of the following possibilities is true:

©2004 CRC Press LLC

Page 179: Statistics in Musicology

i) All states are transient.ii) All states are persistent.Instead of speaking of transient and persistent states one therefore alsouses the notion of transient and persistent Markov chain respectively.Another important property is stationarity of Markov chains. The word

“stationarity” implies that the distribution remains stable in some sense.The first definition concerns initial distributions:Definition 44 A distribution π is called stationary if

k∑i=1

πipij = πj , (6.9)

or in matrix form,πtM = π. (6.10)

This means that if we start with distribution π, then the distribution of allsubsequent X ′

ts is again π.The next question is in how far the initial distribution influences the

dynamic behavior (probability distribution) into the infinite future. A pos-sible complication is that the process may be periodic in the sense that onemay return to certain states periodically:Definition 45 A state j is called to have period τ , if

p(n)jj > 0

implies that n is a multiple of τ .For an irreducible Markov chain, all states have the same period. Hence,the following definition is meaningful:Definition 46 An irreducible Markov chain is called periodic if τ > 1, andit is called aperiodic if τ = 1.It can be shown that for an aperiodic Markov chain, there is at most onestationary distribution and, if there is one, then the initial distribution doesnot play any role ultimately:Theorem 15 If Xt (t = 0, 1, ...) is an aperiodic irreducible Markov chainfor which a stationary distribution π exists, then the following holds:(i) the Markov chain is persistent;

(ii) limn→∞ p(n)ij = πj > 0 for all i, j;

(iii) the stationary distribution π is unique.In the other case of an aperiodic irreducible Markov chain for which nostationary distribution exists, we have

limn→∞ p

(n)ij = 0

for all i, j. Note that this is even the case if the Markov chain is persistent.One then can classify irreducible aperiodic Markov chains into three classes:

©2004 CRC Press LLC

Page 180: Statistics in Musicology

Theorem 16 If Xt (t = 0, 1, 2, ...) is an irreducible aperiodic Markovchain, then one the following three possibilities is true:

(i) Xt is transient,

limn→∞ p

(n)ij = 0

and∞∑n=1

p(n)ij < ∞

(ii) Xt is persistent, but no stationary distribution π exists,

limn→∞ p

(n)ij = 0,

∞∑n=1

p(n)ij = ∞

and

µj =∞∑n=1

nf(n)jj = ∞

(iii) Xt is persistent, and a unique stationary distribution π exists,

limn→∞ p

(n)ij = πj > 0

for all i, j and the average number of steps till the process returns tostate j is given by

µj = π−1j

For Markov chains with a finite state space, the results simplify further:

Theorem 17 If Xt is an irreducible aperiodic Markov chain with a finitestate space, then the following holds:

(i) Xt is persistent

(ii) a unique stationary distribution π = (π1, ..., πk)t exists and is the so-lution of

πt(I −M) = 0, (0 ≤ πj ≤ 1,∑

πj = 1) (6.11)

where I is the m×m identity matrix.

Note that∑jMij =

∑j pij = 1 so that

∑j(I −M)ij = 0, i.e. the matrix

(I−M) is singular. (If this were not the case, then the only solution to thesystem of linear equations would be 0 so that no stationary distributionwould exist.) Thus, there are infinitely many solutions of (6.13). However,there is only one solution that satisfies the conditions 0 ≤ πj ≤ 1 and∑

πj = 1.

©2004 CRC Press LLC

Page 181: Statistics in Musicology

6.2.3 Hidden Markov models

A hidden Markov model is, as the name says, a model where an under-lying Markov process is not directly observable. Instead, observations Xt

(t = 1, 2, ...) are generated by a series of probability distributions which inturn are controlled by an unobserved Markov chain. More specifically, thefollowing definitions are used: let θt (t = 1, 2, ...) be a Markov chain withinitial distribution π so that P (θ1 = j) = πj , and transition probabilities

pij = P (θt+1 = j|θt = i). (6.12)

The state of the Markov chain determines the probability distribution ofthe observable random variables Xt by

ψij = P (Xt = j|θt = i) (6.13)

In particular, if the state spaces of θt and Xt are finite with dimensionsm1 and m2 respectively, then the probability distribution of the process Xt

is determined by the m1-dimensional vector π, the m1 × m1-dimensionaltransition matrixM = (pij)i,j=1,...,m1 and the m2×m1-dimensional matrixΨ = (ψij)i=1,...,m2;j=1,...,m1 that links θt with Xt. Analogous models canbe defined for the case where Xt (t ∈ N) are continuous variables.The flexibility of hidden Markov models is due to the fact that Xt can

be an arbitrary quantity with an arbitrary distribution that can change intime. For instance,Xt itself can be equal to a time seriesXt = (Z1, ..., Zn) =(Z1(t), ..., Zn(t)) whose distribution depends on θt. Typically, such modelsare used in automatic speech processing (see e.g. Levinson et al. 1983, Juangand Rabiner 1991). The variable θt may represent the unobservable state ofthe vocal tract at time t, which in turn “produces” an observable acousticsignal Z1(t), ..., Zn(t) generated by a distribution characterized by θt. Givenobservations Xt (t = 1, 2, ..., N), the aim is to guess which configurationsθt (t = 1, 2, ..., N) the vocal tract was in. More specifically, it is sometimesassumed that there is only a finite number of possible acoustic signals. Wemay therefore denote by Xt the label of the observed signal and estimateθ by maximizing the a posteriori probability P (θ = j|Xt = i). Using theBayes rule, this leads to

θt = arg maxj=1,...,m1

P (θt = j|Xt = i)

= arg maxj=1,...,m1

P (Xt = i|θt = j)P (θt = j)∑m1l=1 P (Xt = i|θt = l)P (θt = l)

(6.14)

6.2.4 Parameter estimation for Markov and hidden Markov models

In principle, parameter estimation for Markov chains and hidden Markovmodels is simple, since the likelihood function can be written down explic-

©2004 CRC Press LLC

Page 182: Statistics in Musicology

itly in terms of simple conditional probabilities. The main difficulties thatcan occur are:

1. Large number of unknown parameters: the unknown parameters for aMarkov chain are the initial distribution π and the transition matrixM = (pij)i,j=1,...,m. If m is finite, then the number of unknown parame-ters is (m−1)+m(m−1). If the initial distribution does not matter, thenthis reduces to m(m − 1). Both numbers can be quite large comparedto the available sample size, since they increase quadratically in m. Thesituation is even worse if the state space is infinite, since then the num-ber of unknown parameters is infinite. A solution to this problem is toimpose restrictions on the parameters or to define parsimonious modelswhere M is characterized by a low-dimensional parameter vector.

2. Implicit solution: The maximum likelihood estimate of the unknownparameters is the solution of a system of nonlinear equations, and there-fore must be found by a suitable numerical algorithm. For real timeapplications with massive data input, as they typically occur in speechprocessing or processing of musical sound signals, fast algorithms arerequired.

3. Asymptotic distribution: The asymptotic distribution of maximum like-lihood estimates is not always easy to derive.

6.3 Specific applications in music

6.3.1 Stationary distribution of intervals modulo 12

We consider intervals between successive notes modulo octave for the upperenvelopes of the following compositions:

• Anonymus: a) Saltarello (13th century); b) Saltarello (14th century); c)Alle Psallite (13th century); d) Troto (13th century)

• A. de la Halle (1235?-1287): Or est Bayard en la pature, hure!

• J. de Ockeghem (1425-1495): Canon epidiatesseron

• J. Arcadelt (1505-1568): a) Ave Mari, b) La Ingratitud, c) Io Dico FraNoi

• W. Byrd (1543-1623): a) Ave Verum Corpus, b) Alman, c) The Queen’sAlman

• J. Dowland (1562-1626): a) Come Again, b) The Frog Galliard, c) TheKing Denmark’s Galliard

• H.L. Hassler (1564-1612): a) Galliard, b) Kyrie from “Missa secunda”,c) Sanctus et Benedictus from “Missa secunda”

• G.P. Palestrina (1525-1594): a) Jesu Rex admirabilis, b) O bone Jesu,c) Pueri Hebraeorum

©2004 CRC Press LLC

Page 183: Statistics in Musicology

• J.P. Rameau (1683-1764): a) La Popliniere, b) Tambourin, c) La Triom-phante (Figure 6.1)

• J.F. Couperin (1668-1733): a) Barriquades mysterieuses, b) La LinotteEffarouchee, c) Les Moissonneurs, d) Les Papillons

• J.S. Bach (1685-1750): Das Wohltemperierte Klavier; Cello-Suites I toVI (1st Movements)

• D. Scarlatti (1660-1725): a) Sonata K 222, b) Sonata K 345, c) SonataK 381

• J. Haydn (1732-1809): Sonata op. 34, No. 2

• W.A. Mozart (1756-1791): a) Sonata KV 332, 2nd Mov., b) Sonata KV545, 2nd Mov., c) Sonata KV 333, 2nd Mov.

• F. Chopin (1810-1849): a) Nocturne op. 9, No. 2, b) Nocturne op. 32,No. 1, c) Etude op. 10, No. 6 (Figure 6.2)

• R. Schumann (1810-1856): Kinderszenen op. 15

• J. Brahms (1833-1897): a) Hungarian dances No. 1, 2, 3, 6, 7, b) Inter-mezzo op. 117, No. 1 (Figures 6.12, 9.7, 11.5)

• C. Debussy (1862-1918): a) Claire de lune, b) Arabesque No. 1, c) Re-flections dans l’eau

• A. Scriabin (1872-1915): Preludes a) op. 2, No. 2, b) op. 11, No. 14, c)op. 13, No. 2

• S. Rachmaninoff (1873-1943): a) Prelude op. 3, No. 2, b) Preludes op.23, No. 3, 5, 9

• B. Bartok (1881-1945): a) Bagatelle op. 11, No. 2, b) Bagatelle op. 11,No. 3, c) Sonata for piano

• O. Messiaen (1908-1992): Vingts regards sur l’enfant de Jesus, No. 3

• S. Prokoffieff (1891-1953): Visions fugitives a) No. 11, b) No. 12, c) No.13

• A. Schonberg (1874-1951): Piano piece op. 19, No. 2

• T. Takemitsu (1930-1996): Rain tree sketch No. 1

• A. Webern (1883-1945): Orchesterstuck op. 6, No. 6Since we are not interested in note repetitions, zero is excluded, i.e. thestate space of Xt consists of the numbers 1,...,11. For the sake of simplicity,Xt is assumed to be a Markov chain. This is, of course, not really truenevertheless an “approximation” by a Markov chain may reveal certaincharacteristics of the composition. The elements of the transition matrixM = (pij)i,j=1,...,11 are estimated by relative frequencies

pij =∑n

t=2 1{xt−1 = i, xt = j}∑n−1t=1 1{xt = i} , (6.15)

©2004 CRC Press LLC

Page 184: Statistics in Musicology

Figure 6.1 Jean-Philippe Rameau (1683-1764). (Engraving by A. St. Aubin afterJ. J. Cafferi, Paris after 1764; courtesy of Zentralbibliothek Zurich.)

and the stationary distribution π of the Markov chain with transition ma-trix M = (pij)i,j=1,...,11 is estimated by solving the system of linear equa-tions

πt(I − M) = 0

as described above. Figures 6.3a through l show the resulting values of πj(joined by lines). For each composition, the vector πj is plotted against j.For visual clarity, points at neighboring states j and j−1 are connected. Thefigures illustrate how the characteristic shape of π changed in the course ofthe last 500 years. The most dramatic change occured in the 20th centurywith a “flattening” of the peaks. Starting with Scriabin a pioneer of atonalmusic, though still rooted in the romantic style of the late 19th century, thisis most extreme for the compositions by Schonberg, Webern, Takemitsu,and Messiaen. On the other hand, Prokoffieff’s “Visions fugitives” exhibitclear peaks but at varying locations. The estimated stationary distributionscan also be used to perform a cluster analysis. Figure 6.4 shows the resultof the single linkage algorithm with the manhattan norm (see Chapter10). To make names legible, only a subsample of the data was used. Analmost perfect separation between Bach and composers from the classicaland romantic period can be seen.

©2004 CRC Press LLC

Page 185: Statistics in Musicology

Figure 6.2 Frederic Chopin (1810-1849). (Courtesy of Zentralbibliothek Zurich.)

6.3.2 Stationary distribution of interval torus values

An analogous analysis can be carried out replacing the interval numbers bythe corresponding values of the torus distance (see Chapter 1). Excludingzeroes, the state space consists of the three numbers 1, 2, 3 only. For thesame compositions as above, the stationary probabilities πj (j = 1, 2, 3) arecalculated. A cluster analysis as above, but with the new probabilties, yieldspractically the same result as before (Figure 6.5). Since the state space con-tains three elements only, it is now even easier to find the patterns thatdetermine clustering. In particular, log-odds-ratios log(πi/πj) (i = j) ap-pear to be characteristic. Boxplots are shown in Figures 6.6a, 6.7a and 6.8afor categories of composers defined by date of birth as follows: a) before1600 (“early music”); b) [1600,1720) (“baroque”); c) [1720,1800) (“clas-sic”); d) [1800,1880) (“romantic and early 20th century”) (Figure 6.12); e)1880 and later (“20th century”). This is a simple, though somewhat arbi-trary, division with some inaccuracies for instance, Schonberg is classifiedin category 4 instead of 5. The log-odds-ratio between π1 and π2 is high-est in the “classical” period and generally tends to decrease afterwards.Moreover, there is a distinct jump from the baroque to the classical period.This jump is also visible for log(π1/π3). Here, however, the attained levelis kept in the subsequent time periods. For log(π2/π3) a gradual increase

©2004 CRC Press LLC

Page 186: Statistics in Musicology

Figure 6.3 Stationary distributions πj (j = 1, ..., 11) of Markov chains with statespace Z12 \ {0}, estimated for the transition between successive intervals.

©2004 CRC Press LLC

Page 187: Statistics in Musicology

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH B

AC

H

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH HA

YD

N

HA

YD

NH

AY

DN

HA

YD

N

HA

YD

N

HA

YD

N

HA

YD

N

MO

ZAR

T

MO

ZAR

T MO

ZAR

T

MO

ZAR

T

MO

ZAR

T

BR

AH

MS

BR

AH

MS

BR

AH

MS

BR

AH

MS

BR

AH

MS

BR

AH

MS

CH

OP

IN

CH

OP

IN

CH

OP

IN

CH

OP

IN

RA

CH

MA

NIN

OFF

RA

CH

MA

NIN

OFF

RA

CH

MA

NIN

OFF

RA

CH

MA

NIN

OFF

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

24

68

Clusters based on stationary distribution

Figure 6.4 Cluster analysis based on stationary Markov chain distributions forcompositions by Bach, Mozart, Haydn, Chopin, Schumann, Brahms, and Rach-maninoff.

can be observed. The differences are even more visible when comparing in-dividual composers. This is illustrated in Figures 6.9a and b where Bach’sand Schumann’s log(π1/π3) and log(π2/π3) are compared, and in Figures6.10a through f where the median and lower and upper quartiles of πj areplotted against j. Finally, Figure 6.11 shows the plots of log(π1/π3) andlog(π2/π3) against the date of birth.

6.3.3 Classification by hidden Markov models

Chai and Vercoe (2001) study classification of folk songs using hiddenMarkov models. They consider, essentially, four ways of representating amelody; namely by a) a vector of pitches modulo 12; b) a vector of pitchesmodulo 12 together with duration (duration being represented by repeatingthe same pitch); c) a sequence of intervals (differenced series of pitches); andd) sequence of intervals, with intervals being classified into only five intervalclasses {0}, {−1,−2}, {1, 2}, {x ≤ −3} and {x ≥ 3}. The observed dataconsist of 187 Irish, 200 German, and 104 Austrian homophonic melodiesfrom folk songs. For each melody representation, the authors estimate theparameters of several hidden Markov models which differ mainly with re-spect to the size of the hidden state space. The models are fitted for each

©2004 CRC Press LLC

Page 188: Statistics in Musicology

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

HA

YD

N

HA

YD

N

HA

YD

N

HA

YD

N

HA

YD

N

HA

YD

N

HA

YD

N

MO

ZA

RT

MO

ZA

RT

MO

ZA

RT

MO

ZA

RT

MO

ZA

RT

BR

AH

MS

BR

AH

MS

BR

AH

MS

BR

AH

MS

BR

AH

MS

BR

AH

MS

CH

OP

IN

CH

OP

IN

CH

OP

IN

CH

OP

IN

RA

CH

MA

NIN

OF

F

RA

CH

MA

NIN

OF

F

RA

CH

MA

NIN

OF

F

RA

CH

MA

NIN

OF

F

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

SC

HU

MA

NN

1.0

1.5

2.0

2.5

3.0

Clusters based on stationary distribution of torus distances

Figure 6.5 Cluster analysis based on stationary Markov chain distributions oftorus distances for compositions by Bach, Mozart, Haydn, Chopin, Schumann,Brahms, and Rachmaninoff.

country separately. Only 70% of the data are used for estimation. Theremaining 30% are used for validation of a classification rule defined asfollows: a melody is assigned to country j, if the corresponding likelihood(calculated using the country’s hidden Markov model) is the largest. Notsurprisingly, the authors conclude that the most reliable distinction can bemade between Irish and non-Irish songs.

©2004 CRC Press LLC

Page 189: Statistics in Musicology

-1.5

-1.0

-0.5

0.0

0.5

b. 1600 1600-1720

1720-1800

1800-1880

from 1880

a) log(pi(1)/pi(2)) for five different periods

-1.5

-1.0

-0.5

0.0

0.5

birth 1720-1800 birth before 1720 or 1800 and later

b): log(pi(1)/pi(2)) for ‘classic’ vs. ‘not classic’

Figure 6.6 Comparison of log odds ratios log(π1/π2) of stationary Markov chaindistributions of torus distances.

-10

12

b. 1600 1600-1720

1720-1800

1800-1880

from1880

a) log(pi(1)/pi(3)) for five different periods

-10

12

birth before 1720 birth 1720 and later

b) log(pi(1)/pi(3)) for ‘upto baroque’ vs. ‘after baroque’

Figure 6.7 Comparison of log odds ratios log(π1/π3) of stationary Markov chaindistributions of torus distances.

©2004 CRC Press LLC

Page 190: Statistics in Musicology

01

23

b. 1600 1600-1720

1720-1800

1800-1880

from 1880

a) log(pi(2)/pi(3)) for five different periods

01

23

birth before 1720 birth 1720 and later

b) log(pi(2)/pi(3)) for ‘upto baroque’ vs. ‘after baroque’

Figure 6.8 Comparison of log odds ratios log(π2/π3) of stationary Markov chaindistributions of torus distances.

-1.0

-0.5

0.0

0.5

1.0

1.5

Bach Schumann

a) log(pi(1)/pi(3)) for Bach and Schumann

-0.5

0.0

0.5

1.0

1.5

2.0

Bach Schumann

b) log(pi(2)/pi(3)) for Bach and Schumann

Figure 6.9 Comparison of log odds ratios log(π1/π3) and log(π2/π3) of stationaryMarkov chain distributions of torus distances.

©2004 CRC Press LLC

Page 191: Statistics in Musicology

Figure 6.10 Comparison of stationary Markov chain distributions of torus dis-tances.

year

log(

pi(1

)/pi(3

))

1200 1400 1600 1800

-10

12

log(pi(1)/pi(3)) plotted against date of birth

ayear

log(

pi(2

)/pi(3

))

1200 1400 1600 1800

01

23

log(pi(2)/pi(3)) plotted against date of birth

b

Figure 6.11 Log odds ratios log(π1/π3) and log(π2/π3) plotted against date ofbirth of composer.

©2004 CRC Press LLC

Page 192: Statistics in Musicology

Figure 6.12 Johannes Brahms (1833-1897). (Courtesy of ZentralbibliothekZurich.)

6.3.4 Reconstructing scores from acoustic signals

One of the ultimate dreams of musical signal recognition is to reconstructa musical score from the acoustic signal of a musical performance. Thisis a highly complex task that has not yet been solved in a satisfactorymanner. Consider, for instance, the problem of polyphonic pitch trackingdefined as follows: given a musical audio signal, identify the pitches ofthe music. This problem is not easy for at least two reasons: a) differentinstruments have different harmonics and a different change of the spec-trum; and b) in polyphonic music, one must be able to distinguish differentvoices (pitches) that are played simultaneously by the same or differentinstruments. An approach based on a rather complex hierarchical modelis proposed for instance in Walmsley, Godsill, and Rayner (1999). Sup-pose that a maximal number N of notes can be played simultaneously anddenote by ν = (ν1, ..., νN )t the vector of 0-1-variables indicating whethernote j (j = 1, ..., N) is played or not. Each note j is associated with a har-monic representation (see Chapter 4) with fundamental frequency j andamplitudes b1(j), ..., bk(j) (k = number of harmonics). Time is divided into

©2004 CRC Press LLC

Page 193: Statistics in Musicology

disjoint time intervals, so-called frames. In each frame i of length mi, thesound signal is assumed to be equal to yi(t) = µi(t) + ei(t) where µi(t)(t = 1, ...,mi) is the sum of the harmonic representations of the notesand a random noise ei. Walmsley et al. assume ei to be iid (independentidentically distributed) normal with zero mean and variance σ2

i . Taking ev-erything together, the probability distribution of the acoustic signal is fullyspecified by a finite dimensional parameter vector θ. In principle, given anobserved signal, θ could be estimated by maximizing the likelihood (seeChapter 4). The difficulty is, however, that the dimension of θ is very highcompared to the number of observations. The solution proposed by Walm-sley et al. is to circumvent this problem by a Bayesian approach, in that θis assumed to be generated by an a priori distribution. Given the data, con-sisting of a sound signal and an a priori distribution p(θ), the a posterioridistribution p(θ|yi) of θ is given by

p(θ|yi) = f(yi|θ)p(θ)∫f(yi|θ)p(θ)dθ

(6.16)

where

f(yi|θ) = (2πσi)−mi/2 exp(−mi∑t=1

e2i (t)/σ2i )

and ei(t) = ei(t; θ). How many notes and which pitches are played can thenbe decided, for instance, by searching for the mode of the distribution.Even if this model is assumed to be realistic, a major practical difficulty

remains: the dimension of θ can be several hundred. The computation ofthe a posteriori distribution is therefore very difficult since calculation of∫f(yi|θ)p(θ)dθ involves high-dimensional numerical intergration. A fur-

ther complication is that some of the parameters may be highly correlated.Walmsley et al. therefore propose to use Markov Chain Monte Carlo Meth-ods (see e.g. Gilks et al. 1996). The essential idea is to simulate the integralby a sample mean of f(yi|θ) where θ is sampled randomly from the a pri-ori distribution p(θ). Sampling can be done by using a Markov processwhose stationary distribution is p. The simulation can be simplified fur-ther by the so-called Gibbs sampler which uses suitable one-dimensionalconditional distributions (Besag 1989).A more modest task than polyphonic pitch tracking is automatic segmen-

tation of monophonic music. The task is as follows: given a monophonicmusical score and a sampled acoustic signal of a performance of the score,identify for each note and rest in the score the corresponding time in-terval in the performance. A possible approach based on hidden Markovprocesses and Bayesian models is proposed in Raphael (1999) (also seeRaphael 2001a,b). Raphael, who is a professional oboist and a mathemati-cal statistician, also implemented his method in a computer system, calledMusic Plus One, that performs the role of a musical accompanist.

©2004 CRC Press LLC

Page 194: Statistics in Musicology

CHAPTER 7

Circular statistics

7.1 Musical motivation

Many phenomena in music are circular. The best known examples are re-peated rhythmic patterns, the circles of fourths and fifths, and scales mod-ulo octave in the well-tempered system. In the circle of fourths, for example,one progresses by steps of a fourth and arrives, after 12 steps, at the ini-tial starting point modulo octave. It is not immediately clear whether andhow to “calculate” in such situations, and what type of statistical proce-dures may be used. The theory of circular statistics has been developedto analyze data on circles where angles have a meaning. Originally, thiswas motivated by data in biology (e.g. direction of bird flight), meteorol-ogy (e.g. direction of wind), and geology (e.g. magnetic fields). Here wegive a very brief introduction, mostly to descriptive statistics. For an ex-tended account of methods and applications of circular statistics see, forinstance, Mardia (1972), Batschelet (1981), Watson (1983), Fisher (1993),and Jammalamadaka and SenGupta (2001). In music, circular methods canbe applied to situations where angles measure a meaningful distance be-tween points on the circle and arithmetic operations in the sense of circulardata are well defined.

7.2 Basic principles

7.2.1 Some descriptive statistics

Circular data are observations on a circle. In other words, observationsconsist of directions expressed in terms of angles. The first question is whichstatistics describe the data in a meaningful way or, at an even more basiclevel, how to calculate at all when “moving” on a circle. The difficulty canbe seen easily by trying to determine the “average direction”. Suppose weobserve two angles ϕ1 = 330o and ϕ2 = 10o. It is plausible to say that theaverage direction is 350o. However, the average is (330o + 10o)/2 = 170o

which is almost the opposite direction. Calculating the sample mean ofangles is obviously not meaningful.The simple solution is to interpret angular observations as vectors in

the plane, with end points on the unit circle, and applying vector addition

©2004 CRC Press LLC

Page 195: Statistics in Musicology

instead of adding angles. Thus, we replace ϕi (i = 1, ..., n) by

xi = (sinϕi, cosϕi)

where ϕ is measured anti-clockwise relative to the horizontal axis. Thefollowing descriptive statistics can then be defined.Definition 47 Let

C =n∑i=1

cosϕi, S =n∑i=1

sinϕi, R =√C2 + S2. (7.1)

The (vector of the) mean direction of ϕi (i = 1, ..., n) is equal to

x =(

cos ϕsin ϕ

)=(

C/RS/R

)(7.2)

Equivalently one may use the following

Definition 48 The (angle of the) mean direction of ϕi (i = 1, ..., n) isequal to

ϕ = arctanS

C+ π1{C < 0}+ 2π1{C > 0, S < 0} (7.3)

Moreover, we haveDefinition 49 The mean resultant length of ϕi (i = 1, ..., n) is equal to

R =R

n(7.4)

Note that R is the length of the vector nx obtained by adding all observedvectors. If all angles are identical, then R = n so that R = 1. In all othercases, we have 0 ≤ R < 1. In the other extreme case with ϕi = 2πi/n(i.e. the angles are scattered uniformly over [0, 2π], there are no clustersof directions), we have R = 0. In this sense, R measures the amount ofconcentration around the mean direction. This leads toDefinition 50 The sample circular variance of ϕi (i = 1, ..., n) is equal to

V = 1− R (7.5)

Note, however, that R is not a perfect measure of concentration, sinceR = 0 does not necessarily imply that the data are scattered uniformly.For instance, suppose n is even, ϕ2i+1 = π and ϕ2i = 0. Thus there are twopreferred directions. Nevertheless, R = 0.Alternative measures of center and variability respectively are the me-

dian and the difference between the lower and upper quartile. The mediandirection is a direction Mn = ϕo determined as follows: a) find the axis(straight line through zero) such that the data are divided into two groupsof equal size (if n is odd, then the axis passes through at least one point,otherwise through the midpoint between the two observations in the mid-dle); b) take the direction ϕ on the chosen axis for which the more points

©2004 CRC Press LLC

Page 196: Statistics in Musicology

xi are closer to the point (cosφ, sinφ)t defined by φ. Similarly, the lowerand upper quartiles, Q1, Q2 can be defined by dividing each of the halvesinto two halves again. An alternative measure of variability is then givenby IQR = Q2 −Q1.Since we are dealing with vectors in the two-dimensional plane, all quan-

tities above can be expressed in terms of complex numbers. In particular,one can define trigonometric moments by

Defini ti on 51 For p = 1, 2, ... let

Cp =n∑i=1

cos pϕi, Sp =n∑i=1

sin pϕi, Rp =√C2p + S2

p (7.6)

Cp =Cpn, Sp =

Spn, Rp =

Rpn

(7.7)

and

ϕ(p) = arctanSpCp

+ π1{Cp < 0}+ 2π1{Cp > 0, Sp < 0} (7.8)

Then

mp = Cp + iSp = Rpeiϕ(p) (7.9)

is called the pth trigonometric sample moment.

For p = 1, this definition yields

m1 = C1 + iS1 = R1eiϕ(1)

with C1 = C, Sp = S, Rp = R and ϕ(p) = ϕ as before. Similarily, we have

Defini ti on 52 Let

Cop =n∑i=1

cos p(ϕi − ϕ(1)), Sop =n∑i=1

sin p(ϕi − ϕ(1)) (7.10)

Cop =Copn, Sop =

Sopn

(7.11)

ϕo(p) = arctanSopCop

+ π1{Cop < 0}+ 2π1{Cop > 0, Sop < 0} (7.12)

Then

mop = Cop + iSop = Rpe

iϕo(p) (7.13)

is called the pth centered trigonometric (sample) moment mop, centered rel-

ative to the mean direction ϕ(1).

Note, in particular, that∑

sin(ϕi−ϕ(1)) = 0 so thatmo1 = R1. An overview

of descriptive measures of center and variability is given in Table 7.1.

©2004 CRC Press LLC

Page 197: Statistics in Musicology

Table 7.1 Some Important Descriptive Statistics for Circular Data

Name Definition Featuremeasured

Sample mean x = (C/R, S/R)t Center

with R =√C2 + S2 (direction)

Mean resultant length R = R/n Concentration

Mean direction ϕ = arctan S/C + π1{C < 0} Center (angle)+2π1{C > 0, S < 0}

Median direction Mn = g(φ) where Center (angle)g(φ) =

∑ni=1 |π − |ϕi − ϕ||

Quartiles Q1, Q2 Q1 = median of {ϕi :Mn − π ≤ ϕi ≤Mn} Center ofQ2 = median of {ϕi :Mn ≤ ϕi ≤Mn + π} “left” and

“right” half

Modal direction Mn = argmax f(ϕ) where Center (angle)

f(ϕ) = estimate of density f

Principal direction a = first eigenvector of CenterS =

∑ni=1 xixt

i (direction,unit vector)

Concentration λ1 = first eigenvalue of S Variability

Circular variance Vn = 1− R Variability

Circular stand. dev. sn =√−2 log(1− V ) Variability

Circular dispersion dn = (1−√C2

2 + S22)/(2R

2) Variability

Mean deviation Dn = π − 1n

∑ni=1 |π − |ϕi −Mn|| Variability

Interquartile range IQR = Q2 −Q1 Variability

7.2.2 Correlation and autocorrelation

A model for perfect “linear” association between two circular random vari-ables ϕ, ψ is

ϕ = ±ψ + (cmod 2π) (7.14)

where c ∈ [0, 2π) is a fixed constant. A sample statistic that measures howclose we are to this perfect association is

rϕ,ψ =

∑ni,j=1;i�=j sin(ϕi − ϕj) sin(ψi − ψj)√∑n

i,j=1;i�=j sin2(ϕi − ϕj)

∑ni,j=1;i�=j sin

2(ψi − ψj)(7.15)

or

rϕ,ψ =det(n−1

∑ni=1 xiy

ti)√

det(n−1∑n

i=1 xixti) det(n−1

∑ni=1 yiy

ti)

(7.16)

©2004 CRC Press LLC

Page 198: Statistics in Musicology

where xi = (cosϕi, sinϕi)t and yi = (cosψi, sinψi)t. For a time seriesϕt (t = 1, 2, ...) of circular data, this definition can be carried over toautocorrelations

r(k) =

∑ni,j=1;i�=j sin(ϕi − ϕj) sin(ϕi+k − ϕj+k)∑n

i,j=1;i�=j sin2(ϕi − ϕj)

(7.17)

or

rϕ(k) =det(n−1

∑n−ki=1 xix

ti+k)

det(n−1∑n−k

i=1 xixti)(7.18)

7.2.3 Probability distributions

A probability distribution for circular data is a distribution F on the in-terval [0, 2π). The sample statistics defined in Section 7.1 are estimates ofthe corresponding population counterparts in Table 7.2.Most frequently used distributions are the uniform, cardioid, wrapped,

von Mises, and mixture distributions.

Uniform distribution U([0, 2π)):

F (u) = P (0 ≤ ϕ ≤ u) =u

2π1{0 ≤ u < 2π},

f(ϕ) = F′(ϕ) =

12π

1{0 ≤ u < 2π}.In this case, µp = ρp = 0, the mean direction µϕ is not defined, and thecircular standard deviation σ and dispersion δ are infinite. This expressesthe fact that there is no preference for any direction and variability istherefore maximal.

Cardioid (or Cosine) distribution C(µ, ρ):

F (u) = [ρ

πsin(u− µ) +

u

2π]1{0 ≤ u < 2π}

and

f(u) =12π

(1 + 2ρ cos(u− µ))1{0 ≤ u < 2π}where 0 ≤ ρ ≤ 1

2 . In this case, µϕ = µ, ρ1 = ρ, µp = 0 (p ≥ 1) andδ = 1/(2ρ2). An interesting property is that this distribution tends to theuniform distribution as ρ → 0.

©2004 CRC Press LLC

Page 199: Statistics in Musicology

Table 7.2 Some important population statistics for circular data

Name Definition Feature

pth trigonometric µp =∫ 2π0 cos(pϕ)dF (ϕ) -

moment +i∫ 2π0

sin(pϕ)dF (ϕ)

= µp,C + iµp,S = ρpeiµϕ(p)

Mean direction µϕ = arctan µ1,S/µ1,C Center (angle)+π1{µ1,C < 0}+2π1{µ1,C > 0, µ1,S < 0}

pth central trig. µop =

∫ 2π0

cos(p(ϕ− µϕ))dF (ϕ) -

moment +i∫ 2π0

sin(p(ϕ− µϕ))dF (ϕ)= µo

p,C + iµop,S

Mean resultant ρ = |µ1| Concentrationlength

Median direction M = {α :∫ π

α−π dF (ϕ) =∫ ϕ+π

α dF (ϕ) = 12} Center (angle)

Quartiles q1, q2 q1 = median of {ϕ : M − π ≤ ϕ ≤M} 25%-quantileq2 = median of {ϕ : M ≤ ϕ ≤M + π} 75%-quantile

Modal direction M = argmax f(ϕ) Center (angle)

Principal direction α = first eigenvector of Center∑ϕ = E(XXt) (direction)

Concentration λ1 = first eigenvalue of∑

ϕ Variability

Circular variance υ = 1− ρ Variability

Circular stand. dev. σ =√−2 log(1 − υ) Variability

Circular dispersion δ = (1− ρ)/(2ρ2) Variability

Mean deviation ∆ = π − ∫ 2π0 |π − |ϕ−M ||dF (ϕ) Variability

Interquartile range IQR = q2 − q1 Variability

Wrapped distribution:

Let X be a random variable with distribution function FX . The randomvariable ϕ = X (mod 2π) has a distribution Fϕ on [0, 2π) given by

Fϕ(u) =∞∑

j=−∞[F (u+ 2πj)− F (2πj)]

If X has a density function fX , then the density function of ϕ is equal to

fϕ(u) =∞∑

j=−∞fX(u + 2πj).

©2004 CRC Press LLC

Page 200: Statistics in Musicology

An important special example is the wrapped normal distribution. Thewrapped normal distribution WN(µ, ρ) is obtained by wrapping a normaldistribution with E(X) = µ and var(X) = −2 log ρ (0 < ρ ≤ 1). Thisyields the circular density function

fϕ(u) =12π

[1 + 2∞∑j=1

ρj2cos j(u− µ)]1{0 ≤ u < 2π}

Then, µϕ = µ, ρ1 = ρ, δ = (1−ρ4)/(2ρ2), µp,C = ρp2and µp,S = 0 (p ≥ 1).

For ρ → 0, we obtain the uniform distribution, and for ρ → 1 a distributionwith point mass in the direction µϕ.

von Mises distribution M(µ, κ)

The most frequently used unimodal circular distribution is the von Misesdistribution with density function

fϕ(u) =1

2πIo(κ)eκ cos(u−µ)1{0 ≤ u < 2π}

where 0 ≤ κ < ∞, 0 ≤ µ < 2π and

Io =12π

∫ 2π

o

exp(κ cos(v − µ))dv =∞∑j=0

1(j!)2

2)2j

is the modified Bessel function of the first kind and order 0. In this case,we have µϕ = µ, ρ1 = I1/Io, δ = (κI1/Io)−1, µp,C = Ip/Io and µp,S = 0(p ≥ 1) where

Ip =∞∑j=0

1(j + p)!j!

2)2j+p

is a modified Bessel function of order p. For κ → 0, theM(µ, κ)-distributionconverges to U([0, 2π)), and for κ → ∞ we obtain a point mass in thedirection µϕ.

Mixture distribution:

All distributions above are unimodal. Distributions with more than onemode can be modeled, for instance, by mixture distributions

fϕ(u) = p1fϕ,1(u) + ...+ pmfϕ,m(u)

where 0 ≤ p1, ..., pm ≤ 1,∑

pi = 1 and fϕ,j are different circular probabil-ity densities.

7.2.4 Statistical inference

Statistical inference about population parameters is mainly known for thedistributions above. Classical methods can be found in Mardia (1972),

©2004 CRC Press LLC

Page 201: Statistics in Musicology

Batschelet (1981), Watson (1983), and Fisher (1993). For recent resultssee e.g. Jammalamadaka and SenGupta (2001).

7.3 Sp ecific applications in music

7.3.1 Variability and autocorrelation of notes modulo 12

Figure 7.1 Bela Bartok – statue by Varga Imre in front of the Bela Bartok Memo-rial House in Budapest. (Courtesy of the Bela Bartok Memorial House.)

The following analysis is done for various compositions: pitch is repre-sented in Z12 with 0 set equal to the note (modulo 12) with the highestfrequency in the composition. Given a note j in Z12, the correspondingcircular point is then x = (x1, x2)t = (cos(2πj/12), sin(2πj/12))t. Thefollowing statistics are calculated: λ1, R, d and the maximal circular auto-correlation m = max1≤k≤10 |rϕ(k)|. The compositions considered here are:

Figure 7.2 Sergei Prokoffieff as a child. (Courtesy of Karadar Bertoldi Ensemble;www.karadar.net/Ensemble/.)

©2004 CRC Press LLC

Page 202: Statistics in Musicology

Figure 7.3 Circular representation of compositions by J. S. Bach (Praludium undFuge No. 5 from “Das Wohltemperierte Klavier”), D. Scarlatti (Sonata Kirk-patrick No. 125), B. Bartok (Bagatelles No. 3), and S. Prokoffieff (Visions fugi-tives No. 8).

©2004 CRC Press LLC

Page 203: Statistics in Musicology

• J. S. Bach: Das Wohltemperierte Klavier I (all preludes and fugues)

• D. Scarlatti: Sonatas Kirkpatrick No. 49, 125, 222, 345, 381, 412, 440,541

• B. Bartok (Figure 7.1): Bagatelles No. 1–3, Sonata for Piano (2nd move-ment)

• S. Prokoffief (Figure 7.2): Visions fugitives No. 1–15.

To simplify the analysis, the upper envelope is considered for each composi-tion. The data set that was available consists of played music. Thus, insteadof the written score we are looking at its realization by a pianist. This re-sults in some changes of onset times. In particular, some notes with equalscore onset times are not played simultaneously. Strictly speaking, the anal-ysis thus refers to the played music rather than the original score. In Figure7.3, four representative compositions are displayed. Z12 is represented by acircle starting on top with 0 and proceeding clockwise as j ∈ Z12 increases.A composition is thus represented by pitches j1, ..., jn ∈ Z12, each pitch be-ings represented by a dot on the circle. In order to visualize how frequenteach note is, each point xi = (cosϕi, sinϕi)t (i = 1, ..., n) where ϕi = 2πji,is displaced slightly by adding a random number from a uniform distribu-tion on [0, 0.1] to the angle φi. (This technique of exploratory data analysisis often referred to as “jittering” see Chambers et al. 1983) Moreover, toobtain an impression of the dynamic movement, successive points xi, xi+1

are joined by a line. The connections visualize which notes are likely tofollow each other. Some clear differences are visible between the four plots:for Bach, the main movements take place along the edges, the main pointsand vertices corresponding to the D-major scale. The rather curious simplefigure for Bartok’s Bagatelle No. 3 stems from the continuous repetition ofthe same chromatic figure in the upper voice. For Prokoffieff one can seetwo main vertices that are positioned symmetrically with respect to themiddle vertical line. This is due to the repetitive nature of the upper en-velope. Figure 7.4 shows boxplots of λ1, R, d, and logm, comparing Bach,Scarlatti, Bartok and Prokoffief. Variability is clearly lower for Bartok andProkoffief, independently of the specific statistic that is used. There arealso some, but less extreme, differences with respect to the maximal auto-correlation m. As one may perhaps expect, Bartok has the highest valuesof m.

7.3.2 Variability and autocorrelation of note intervals modulo 12

The same as above can be carried out for intervals between successivenotes (Figure 7.5). Figure 7.6 shows that, again, variability is much lowerfor Bartok and Prokoffieff.

©2004 CRC Press LLC

Page 204: Statistics in Musicology

Figure 7.4 Boxplots of λ1, R, d and log m for notes modulo 12, comparing Bach,Scarlatti, Bartok, and Prokoffief.

©2004 CRC Press LLC

Page 205: Statistics in Musicology

Figure 7.5 Circular representation of intervals of successive notes in the followingcompositions: J. S. Bach (Praludium und Fuge No. 5 from “Das WohltemperierteKlavier”), D. Scarlatti (Sonata Kirkpatrick No. 125), B. Bartok (Bagatelles No.3), and S. Prokoffieff (Visions fugitives No. 8).

©2004 CRC Press LLC

Page 206: Statistics in Musicology

Figure 7.6 Boxplots of λ1, R, d and log m for note intervals modulo 12, comparingBach, Scarlatti, Bartok, and Prokoffief.

©2004 CRC Press LLC

Page 207: Statistics in Musicology

Figure 7.7 Circular representation of notes ordered according to circle of fourthsin the following compositions: J. S. Bach (Praludium und Fuge No. 5 from ”DasWohltemperierte Klavier”), D. Scarlatti (Sonata Kirkpatrick No. 125), B. Bartok(Bagatelles No. 3), and S. Prokoffieff (Visions fugitives No. 8).

©2004 CRC Press LLC

Page 208: Statistics in Musicology

Figure 7.8 Boxplots of λ1, R, d and log m for notes 12 ordered according to circleof fourths, comparing Bach, Scarlatti, Bartok and Prokoffief.

©2004 CRC Press LLC

Page 209: Statistics in Musicology

Figure 7.9 Circular representation of intervals of successive notes ordered accord-ing to circle of fourths in the following compositions: J. S. Bach (Praludium undFuge No. 5 from “Das Wohltemperierte Klavier”), D. Scarlatti (Sonata Kirk-patrick No. 125), B. Bartok (Bagatelles No. 3), and S. Prokoffieff (Visions fugi-tives No. 8).

©2004 CRC Press LLC

Page 210: Statistics in Musicology

Figure 7.10 Boxplots of λ1, R, d and log m for note intervals modulo 12 orderedaccording to circle of fourths, comparing Bach, Scarlatti, Bartok, and Prokoffief.

©2004 CRC Press LLC

Page 211: Statistics in Musicology

7.3.3 Notes and intervals on the circle of fourths

Alternatively, the analysis above can be carried out by ordering notes ac-cording to the circle of fourths. Thus, a rotation by 360o/12 = 30o corre-sponds to a step of one fourth. The analogous plots are given in Figures 7.7through 7.10. This specific circular representation makes some symmetriesand their harmonic meaning more visible.

©2004 CRC Press LLC

Page 212: Statistics in Musicology

CHAPTER 8

Principal comp onent analysis

8 . 1 M us i cal m o ti vati o n

Observations in music often consist of vectors. Consider, for instance, thetempo measurements for Schumann’s Traumerei (Figure 2.3). In this case,the observational units are performances and an observation consists of atempo “curve” which is a vector of n tempo measurements x(ti) at symbolicscore onset times ti (i = 1, ..., p). The main question is which similaritiesand differences there are between the performances. Principal componentanalysis (PCA) provides an answer in the sense that the “most interesting”,and hopefully interpretable, projections are found. In this chapter, a briefintroduction to PCA is given. For a detailed account and references seee.g. Mardia et al. (1979), Anderson (1984), Dillon and Goldstein (1984),Seber (1984), Krzanowski (1988), Flury and Riedwyl (1988), Johnson andWichern (2002).

8.2 Basic principles

8.2.1 Definition of PCA for multivariate probability distributions

Algorithmic definition

Let X = (X1, ..., Xp)t be a random vector with expected value E(X) = µand covariance matrix Σ. The following algorithm is defined:• Step 0. Initialization: Set j = 1 and Z(1) = X.

• Step 1. Find a direction, i.e. a vector a(j) with |a(j)| = 1, such thatthe projection Zj = [a(j)]tZ(j) = a

(j)1 Z

(j)1 + ...+ a

(j)p Z

(j)p has the largest

possible variance.• Step 2. Consider the part of Z(j) that is orthogonal to a(1), ..., a(j), i.e.set Z(j+1) = Z(j) − Zja

(j). If j = p, or all components of Z(j+1) havevariance zero, then stop. Otherwise set j = j + 1 and go to Step 1.

The algorithm finds successively orthogonal directions a(1), a(2), ... suchthat the corresponding projections of Z have the largest variance amongall projections that are orthogonal to the previous ones. A projection witha large variance is suitable for comparing, ranking, and classifying observa-tions, since different random realizations of the projection tend to be widelyscattered. In contrast, if a projection has a small variance, then individuals

©2004 CRC Press LLC

Page 213: Statistics in Musicology

do not differ very much with respect to that projection, and are thereforemore difficult to distinguish.

Definition via spectral decomposition of matrices

The algorithm given above has an elegant interpretation:Theorem 18 (Spectral decomposition theorem) Let B be a symmetric p×pmatrix. Then B can be written as

B = AΛAt =p∑j=1

λja(j)[a(j)]t (8.1)

where Λ =

λ1 0 . . 00 λ2 .. . .. . 00 . . 0 λp

is a diagonal matrix, λj are the eigen-

values and the columns a(j) of A the corresponding orthonormal eigenvec-tors of B, i.e. we have

Ba(j) = λja(j) (8.2)

|a(j)|2 = [a(j)]ta(j) = 1, and [a(j)]ta(l) = 0 for j �= l (8.3)In matrix form equation (8.3) means that A is an orthogonal matrix, i.e.

AtA = I (8.4)

where I denotes the identity matrix with Ijj = 1 and Ijl = 0 (j �= l).This result can now be applied to the covariance matrix of a random

vector X = (X1, ..., Xp)t :Theorem 19 Let X be a p-dimensional random vector with expected valueE(X) = µ and p× p covariance matrix Σ. Then

Σ = AΛAt (8.5)

where the columns a(j) of A are eigenvectors of Σ and Λ is a diagonalmatrix with eigenvalues λ1, ..., λp ≥ 0.In particular, we may permute the sequence of the X-components such thatthe eigenvalues are ordered. We thus obtain:Theorem 20 Let X be a p-dimensional random vector with expected valueE(X) = µ and a p×p covariance matrix

∑. Then there exists an orthogonal

matrix A such thatΣ = AΛAt (8.6)

where the columns a(j) of A are eigenvectors of∑

and Λ is a diagonalmatrix with eigenvalues λ1 ≥ λ2 ≥ ... ≥ λp ≥ 0. Moreover, the covariancematrix of the transformed vector

Z = At(X − µ) (8.7)

©2004 CRC Press LLC

Page 214: Statistics in Musicology

is equal tocov(Z) = AtΣA = Λ (8.8)

Note in particular that var(Z1) = λ1 ≥ var(Z2) = λ2 ≥ ... ≥ var(Zp) = λpand the covariance matrix Σ may be approximated by a matrix

Σ(q) =q∑j=1

λja(j)[a(j)]t

for a suitably chosen value q ≤ p. If a good approximation can be achievedfor a relatively small value of q, then this means that most of the randomvariation in X occurs in a low dimensional space spanned by the randomvector Z(q) = (Z1, ..., Zq)t.Definition 53 The transformation defined by Z = At(X −µ) is called theprincipal component transformation. The ith component of Z,

Zj = [At(X − µ)]j = [(X − µ)ta(j)]t (8.9)

is called the jth principal component of X. The jth column of A, i.e. thejth eigenvector a(j), is called the vector of principal component loadings.In summary, the principal component transformation rotates the originalrandom vector X−µ in such a way that the new coordinates Z1, ..., Zp areuncorrelated (orthogonal) and they are ordered according to their impor-tance with respect to characterizing the covariance structure of X .The following result states that the algorithmic and the algebraic defini-

tion are indeed the same:Theorem 21 Consider U = btX where b = (b1, ..., bp)t and |b| = 1. Sup-pose that U is orthogonal (i.e. uncorrelated) to the first k principal compo-nents of X. Then var(U) is maximal, among all such projections, if andonly if b = a(k+1), i.e. if U is the (k + 1)st principal component Zk+1.

8.2.2 Definition of PCA for observed data

The definition of principal components given above cannot applied directlyto data, since the expected value and covariance matrix are usually un-known. It can however be modified in an obvious way by replacing pop-ulation quantities by suitable estimates. The simplest solution is to usethe sample mean and the sample covariance matrix. For observed vectorsx(i) = (x1(i), ..., xp(i))t (i = 1, 2, ..., n) one defines

µ = x =1n

n∑i=1

x(i) (8.10)

and the estimate of the covariance matrix

Σ =1n

n∑i=1

(x(i)− x)(x(i) − x)t. (8.11)

©2004 CRC Press LLC

Page 215: Statistics in Musicology

The estimated ith vector of principal component loadings, a(j), is the stan-dardized eigenvector corresponding to the jth-largest eigenvalue of Σ. Theestimated principal component transformation is then defined by

z = At(x − x) = [(x− x)tA]t (8.12)

where the columns of A are equal to the orthogonal vectors a(j). Applyingthis transformation to the observed vectors x(1), ..., x(n), enables us tocompare observations with respect to their principal components. The jthprincipal component of the ith observation is equal to

zj(i) = (x(i)− x)ta(j) (8.13)

In other words, the ith observed vector x(i)−x is transformed into a rotatedvector z(i) = (z1(i), ..., zp(i))t with the corresponding observed principalcomponents. In matrix form, we can define the n×p matrix of observations

X =

x1(1) x2(1) · · · xp(1)x1(2) x2(2) · · · xp(2)

.......

...x1(n) x2(n) · · · xp(n)

(8.14)

and the n× p matrix of observed principal components

Z =

z1(1) z2(1) · · · zp(1)z1(2) z2(2) · · · zp(2)...

.......

z1(n) z2(n) · · · zp(n)

(8.15)

so thatZ = (X − I yt)A (8.16)

where I denotes the identity matrix. Note that the jth column z(j) =(zj(1), ..., zj(n))t consists of the observed jth principal components. There-fore, the sample variance of the jth principal components is given by

s2z = n−1n∑i=1

z2j (i) = λj .

If λj is large, then the observed jth principal components zj(1), ..., zj(n)have a large sample variance so that the observed values are scattered farapart.

8.2.3 Scale invariance?

The principal component transformation is based on the covariance ma-trix. It is therefore not scale invariant, since variance and covariance de-pend on the units in which individual components Xj are measured. It is

©2004 CRC Press LLC

Page 216: Statistics in Musicology

therefore often recommended to standardize all components. Thus, we re-place each coordinate xj by (xj − xj)/sj where xj = n−1

∑ni=1 xj(i) and

s2j = n−1∑n

i=1(xj(i)− xj)2 (or s2j = (n− 1)−1∑n

i=1(xj(i)− xj)2).

8.2.4 Choosing important principal components

Since an orthogonal transformation does not change the length of vectors,the “total variability” of the random vector Z in (8.7) is the same as the oneof the original random vector X with covariance matrix Σ = (σij)i,j=1,...,p.More specifically, one defines total variability by

Vtotal = tr(Σ) =p∑i=1

σii. (8.17)

The singular value decomposition (spectral decomposition) of Σ then im-plies

Theorem 22 Let Σ be a covariance matrix with spectral decompositionΣ = AΛAt. Then

Vtotal = tr(Σ) =p∑i=1

λii (8.18)

Since the eigenvalues λi are ordered according to their size, we may there-fore hope that the proportion of total variation

P (q) =λ1 + ...+ λq∑p

i=1 λi(8.19)

is close to one for a low value of q. If this is the case, then one may re-duce the dimension of the random vector considerably without losing muchinformation. For data, we plot P (q) = (λ1 + ... + λq)/

∑λi versus q and

judge by eye from which point on the increase in P (q) is not worth theprice of adding additional dimensions. Alternatively, we may plot the con-tribution of each eigenvalue, λj/

∑λi or λj itself, against j. This is the

so-called scree graph. More formal tests, e.g. for testing which eigenvaluesare nonzero or for comparing different eigenvalues, are available howevermostly under the rather restrictive assumption that the distribution of Xis multivariate normal (see e.g. Mardia et al. 1979, Ch. 8.3.2).In addition to the scree plot, the decision on the number of principal

components is often also based on the (possibly subjective) interpretabilityof the components. The interpretation of principal components may bebased on the coefficients a(i)

j and/or on the correlation between Zj andthe coordinates of the original random vector X = (X1, ..., Xp)t. Note thatsince E(ZXt) = E(AtXXt) = AtΣ = AtAΛAt = ΛAt, var(Xk) = σkk and

©2004 CRC Press LLC

Page 217: Statistics in Musicology

var(Zi) = λi, the correlation between Zj and Xk is equal to

ρj,k = corr(Zj , Xk) = a(j)k

√λjσkk

(8.20)

Analogously, for observed data we have the empirical correlations

ρj,k = a(j)k

√λjσkk

(8.21)

8.2.5 Plots

One of the main difficulties with high-dimensional data is that they cannotbe represented directly in a two-dimensional display. Principal componentsprovide a possible solution to this problem. The situation is particularlysimple if the first two principal components explain most of the variability.In that case, the original data (x1(i), ..., xp(i))t (i = 1, 2, ..., n) may be re-placed by the first two principal components (z1(i), z2(i))t (i = 1, 2, ..., n).Thus, z2(i) is plotted against z1(i). If more than two principal componentsare needed, then the plot of z2(i) versus z1(i) provides at least a partial viewof the data structure, and further projections can viewed by correspondingscatter plots of other components, or by symbol plots as described in Chap-ter 2. The scatter plots can be useful for identifying structure in the data.In particular, one may detect unusual observations (outliers) or clusters ofsimilar observations.

8.3 Sp ecific applications in music

8.3.1 PCA of tempo skewness

The 28 tempo curves in Figure 2.3, each consisting of measurements atp = 212 onset times, can be considered as n = 28 observations of a 212-dimensional random vector. Principal component analysis cannot be ap-plied directly to these data. The reason is that PCA relies on estimatingthe p× p covariance matrix. The number of observations (n = 28) is muchsmaller than p. Therefore, not all elements of the covariance matrix canbe estimated consistently and an empirical PCA-decomposition would behighly unreliable. A solution to this problem is to reduce the dimension pin a meaningful way. Here, we consider the following reduction: the onset-time axis is divided into 8 disjoint blocks A1, A2, A

′1, A

′2, B1, B2, A

′′1 , A

′′2 of

4 bars each. For each part number i (i = 1, ..., 8) and each performance j(j = 1, ..., 28), we calculate the skewness measure

ηj(i) =x−M

Q2 −Q1

©2004 CRC Press LLC

Page 218: Statistics in Musicology

1 2 3 4 5 6 7 8

-0.6

-0

.4

-0.2

0

.0

Skewness of tempo plotted against period 1,2, ,8

Figure 8.1 Tempo curves for Schumann’s Traumerei: skewness for the eight partsA 1 , A2 , A

′1 , A

′2 , B1 , B2 , A

′′1 , A

′′2 for 28 performances, plotted against the number of

the part.

whereM is the median and Q1, Q2 are the lower and upper quartile respec-tively. Figure 8.1 shows ηj(i) plotted against i. An apparent pattern is thegenerally strong negative skewness in B2. (Recall that negative skewnesscan be created by extreme ritardandi.) Apart from that, however, Figure8.1 is difficult to interpret directly. Principal component analysis helps tofind more interesting features. Figure 8.3 shows the loadings for the firstfour principal components which explain more than 80% of the variability(see Figure 8.2). The loadings can be interpreted as follows: the first com-ponent corresponds to a weighted average emphasizing the skewness valuesin the first half of the piece. The 28 performances apparently differ mostwith respect to ηj(i) during the first 16 bars of the piece (parts A1, A2,

A′1, A

′2). The second most important distinction between pianists is charac-

terized by the second component. This component compares skewness forthe A-parts with the values in B1 and B2. The third component essentially

©2004 CRC Press LLC

Page 219: Statistics in Musicology

Co

mp

. 1

Co

mp

. 2

Co

mp

. 3

Co

mp

. 4

Co

mp

. 5

Co

mp

. 6

Co

mp

. 7

Co

mp

. 80

.0

0.0

05

0.0

15

0.0

25

Skewness of tempo - screeplot

Varia

nce

s

0.355

0.564

0.709

0.824

0.8890.935 0.971 1

Figure 8.2 Schumann’s Traumerei: screeplot for skewness.

compares the first with the second half. Finally, the fourth component es-sentially compares the odd with the even numbered parts, excluding theend A

′′1 , A

′′2 . Components two to five are displayed in Figure 8.4, with z2

and z3 on the x- and y-axis respectively and rectangles representing z4 andz5. Note in particular that Cortot and Horowitz mainly differ with respectto the third principal component. Horowitz has a more extreme differencein skewness betweem the first and second halves of the piece. Also strikingare the “outliers” Brendel, Ortiz, and Gianoli. The overall skewness, asrepresented by the first component, is quite extreme for Brendel and Ortiz.For comparison, their tempo curves are plotted in Figure 8.5 together withCortot’s and Horowitz’ first performances. In view of the PCA one maynow indeed see that in the tempo curves by Brendel and Ortiz there is astrong contrast between small tempo variations applied most of the timeand occasional strong local ritardandi.

©2004 CRC Press LLC

Page 220: Statistics in Musicology

load

ing

2 4 6 8

0.2

0.3

0.4

0.5 A1 A2 A’1 A’2 B1 B2 A’’1 A’’2

Skewness: Loadings of first PCA-component

load

ing

2 4 6 8

-0.6

-0.2

0.2

A1 A2 A’1 A’2 B1 B2 A’’1 A’’2

Skewness: Loadings of second PCA-component

load

ing

2 4 6 8

-0.2

0.2

0.6

A1 A2 A’1 A’2 B1 B2 A’’1 A’’2

Skewness: Loadings of third PCA-component

load

ing

2 4 6 8

-0.6

-0.2

0.2

A1 A2 A’1 A’2 B1 B2 A’’1 A’’2

Skewness: Loadings of fourth PCA-component

Figure 8.3 Schumann’s Traumerei: loadings for PCA of skewness.

z2

z3

-0.5 -0.4 -0.3 -0.2 -0.1 0.0

-0.5

-0.4

-0.3

-0.2

-0.1

0.0

PCA of skewness - symbol plot of principal components 2-5

ARGERICH

ARRAUASKENAZE

BRENDELBUNIN

CAPOVA

CURZONDAVIE

S

DEMUS

ESCHENBACH

GIANOLI

KATSARIS

KLIEN

KRUST

KUBALEK

MOIS

EIWIT

SCH

NEY

NOVAES

ORTIZ

SCHNABEL

SHELLEY

ZAKCORTOT1CORTOT2

CORTOT3

HOROWIT

Z1

HOROWIT

Z2

HOROWIT

Z3

Figure 8.4 Schumann’s Traumerei: symbol plot of principal components z2, ..., z5

for PCA of tempo skewness.

©2004 CRC Press LLC

Page 221: Statistics in Musicology

0 50 100 150 200

-25

-2

0

-15

-1

0

-5Cortot1

Horowitz1

Brendel

Gianoli

Figure 8.5 Schumann’s Traumerei: tempo curves by Cortot, Horowitz, Brendel,and Gianoli.

8.3.2 PCA of entropies

Consider the entropy measures E1, E2, E3, E4, E8 and E10 defined inChapter 3. We ask the following question: is there a combination of en-tropy measures that enables us to distinguish ”computationally” betweenvarious styles of composition? The following compositions are included inthe study: Henry Purcell 2 Airs (Figure 8.6), Hornpipe; J.S. Bach Firstmovements of Cello Suites No. 1-6, Prelude and Fugue No. 1 and 8 from“Das Wohltemperierte Klavier”; W.A. Mozart KV 1e, 331/1, 545/1; R.Schumann op. 15, No. 2,3,4,7; op. 68, No. 2, 16; A. Scriabin op. 51, No.2, 4; F. Martin Preludes No. 6, 7 (cf. Figures 8.11, 8.12). For each compo-sition, we define the vector x = (x1, ..., x6)t = (E1, E2, E3, E4, E9, E10)t.The results of PCA are displayed in Figures 8.7 through 8.10. The firstprincipal component mainly consists of an average of the first four com-ponents and a comparison with E10 (Figure 8.8). The second componentessentially includes a comparison between E9 and E10, whereas the thirdcomponent is mainly a weighted average of E2, E9, and E10. Finally, thefourth component compares E2, E3 with E1. According to the screeplot(Figure 8.7), the first three components already explain more than 95% ofthe variability. Scatterplots of the first three components (Figures 8.9 and8.10) together with symbols representing the next two components show a

©2004 CRC Press LLC

Page 222: Statistics in Musicology

clear clustering. For clarity, only three different names (Purcell, Bach, andSchumann) are written explicitly in the plots. Schumann turns out to becompletely separated from Bach. Moreover, Purcell appears to be some-what outside the regions of Bach and Schumann, in particular in Figure8.10. In conclusion, entropies, as defined above, do indeed seem to capturecertain features of a composer’s style.

©2004 CRC Press LLC

Page 223: Statistics in Musicology

q = 96

AIRHenry Purcell (1659-1695)

Piano

6

11

14

Figure 8.6 Air by Henry Purcell (1659-1695).

©2004 CRC Press LLC

Page 224: Statistics in Musicology

Figure 8.7 Screeplot for PCA of entropies.

Figure 8.8 Loadings for PCA of entropies.

©2004 CRC Press LLC

Page 225: Statistics in Musicology

-4 -2 0 2

-2-1

01

23

4

Pur

cell

Pur

cell

Pur

cell

Bac

h

Bac

hB

ach

Bac

h

Bac

h

Bac

h

Bac

h

Bac

h

Bac

h

Bac

h

Sch

uman

n

Sch

uman

n

Sch

uman

n

Sch

uman

n

Sch

uman

n

Sch

uman

n

Entropies - second vs. first principal component;rectangles with width=3rd comp., height=4th comp.

Figure 8.9 Entropies – symbol plot of the first four principal components.

-1 0 1 2 3 4

-2-1

01

Pur

cell

Pur

cell

Pur

cell

Bac

h

Bac

hB

achBac

h Bac

hBac

h

Bac

hB

ach

Bac

hBac

h

Sch

uman

n

Sch

uman

n

Sch

uman

n

Sch

uman

n

Sch

uman

n

Sch

uman

n

Third vs. second principal component -rectangles with width=4th comp., height=5th comp.

Figure 8.10 Entropies – symbol plot of principal components no. 2-5.

©2004 CRC Press LLC

Page 226: Statistics in Musicology

Figure 8.11 F. Martin (1890-1971). (Courtesy of the Societe Frank Martin andMrs. Maria Martin.)

Figure 8.12 F. Martin (1890-1971) - manuscript from 8 Preludes. (Courtesy ofthe Societe Frank Martin and Mrs. Maria Martin.)

©2004 CRC Press LLC

Page 227: Statistics in Musicology

CHAPTER 9

Discriminant analysis

9.1 Musical motivation

Discriminant analysis, often also referred to under the more general notionof pattern recognition, answers the question of which category an observeditem is most likely to belong to. A typical application in music is attributionof an anonymous composition to a time period or even to a composer.Other examples are discussed below. A prerequisite for the application ofdiscriminant analysis is that a “training data set” is available where thecorrect answers are known. We give a brief introduction to basic principlesof discriminant analysis. For a detailed account see e.g. Mardia et al. (1979),Klecka (1980), Breiman (1984), Seber (1984), Fukunaga (1990), McLachlan(1992) and Huberty (1994), Ripley (1995), Duda et al. (2000), Hastie et al.(2001).

9.2 Basic principles

9.2.1 Allocation rules

Suppose that an observation x ∈ Rk is known to belong to one of p mu-tually exclusive categories G1, G2,...,Gp. Associated with each category isa probability density fi(x) of X on Rk. This means that if an individualcomes from group i, then the individual’s random vector X has the prob-ability distribution fi. The problem addressed by discriminant analysis isas follows: observe X = x, and try to guess which group the observationcomes from. The aim is, of course, to make as few mistakes as possible. Inprobability terms this amounts to minimizing the probability of misclassi-fication.The solution is defined by a classification rule. A classification rule is a

division of Rk into p disjoint regions: Rk = R1 ∪ R2... ∪ Rp, Ri ∩ Rj = φ(i �= j). The rule allocates an observation to group Gi, if x ∈ Ri. Moregenerally, we may define a randomized rule by allocating an observation togroup Gi with probability ψi(x), where

∑πi=1 ψi(x) = 1 for every x. The

advantage of allowing random allocation is that discriminant rules can beaveraged and the set of all random rules is convex, thus allowing to findoptimal rules. Note that deterministic rules are a special case, by settingψi(x) = 1 if x ∈ Ri and 0 otherwise.

©2004 CRC Press LLC

Page 228: Statistics in Musicology

9.2.2 Case I: Known population distributions

Discriminant analysis without prior group probabilities – the ML-rule

Assume that it is not known a priori which of the groups is more likely tooccur; however for each group the distribution fi is known exactly. Thiscase is mainly of theoretical interest; it does however illustrate the essentialideas of discriminant analysis.A plausible discriminant rule is the Maximum Likelihood Rule (ML-

Rule): allocate x to group Gi, if

fi(x) = maxj=1,...,p

fj(x) (9.1)

If the maximum is reached for several groups, then x is considered to be inthe union of these (for continuous distributions this occurs with probabilityzero). In the case of two groups the ML-rule means that x is allocated toG1, if f1(x) > f2(x), or, equivalently,

logf1(x)f2(x)

> 0 (9.2)

In the case where all probability densities are normal with equal covariancematrices we have:Theorem 23 Suppose that each fi is a multivariate normal distributionwith expected value µi and covariance matrix Σi. Suppose further that Σ1 =Σ2 = ... = Σp = Σ and detΣ > 0. Then the ML-rule is given as follows:allocate x to group Gi, if

(x− µi)tΣ−1(x− µi) = minj=1,...,p

(x− µj)tΣ−1(x− µj) (9.3)

Note that the “Mahalanobis distance” di = (x−µi)tΣ−1(x−µi) measureshow far x is from the expected value µi, while taking into account covari-ances between the components of the random vector X = (X1, ..., Xp)t. Inparticular, for p = 2, x is allocated to G1, if

at(x− 12(µ1 + µ2)) > 0 (9.4)

where a = Σ−1(µ1− µ2). Thus, we obtain a linear rule where x is comparedwith the midpoint between µ1 and µ2.

Discriminant analysis with prior group probabilities – the Bayesian rule

Sometimes one has a priori knowledge (or belief) how likely each of thegroups is to occur. Thus, it is assumed that we know the probabilities

πi = P (observation drawn from group Gi) (i = 1, ..., p) (9.5)

where 0 ≤ πi ≤ 1 and∑

πi = 1. The conditional likelihood that the obser-vation comes from group Gi given the observed value X = x is proportional

©2004 CRC Press LLC

Page 229: Statistics in Musicology

to πifi(x). The natural rule is then the Bayes rule: Allocate x to Gi, if

πifi(x) = maxj=1,...,p

πjfj(x) (9.6)

For the “noninformative prior” π1 = π2 = ... = πp = 1/p, representingcomplete lack of knowledge about which groups observations are more likelyto come from, the Bayes rule coincides with the ML-rule. In the case of twogroups, the Bayes rule is a simple modification of the ML-rule, since x isallocated to G1, if

logf1(x)f2(x)

> logπ2

π1(9.7)

Which rule is better?

The quality of a rule is judged by the probability of correct classification (ormisclassification). There are two standard ways of comparing classificationrules: a) comparison of individual probabilities of correct classification; andb) comparison of the overall probability of correct classification.The first criterion can be understood as follows: for a random alloca-

tion rule with probabilties ψi(.), the probability that a randomly chosenindividual coming from group Gi is classified into group Gj is equal to

pji =∫

ψj(x)fi(x)dx (9.8)

Thus, correct classification for individuals from group Gi occurs with prob-ability pii and misclassification with probability 1 − pii. A rule r withcorrect-classification-probabilities pii is said to be at least as good as a ruler with probabilities pii, if pii ≥ pii for all i. If there is at least one “ > ”sign, then r is better. If there is no better rule than r, then r is calledadmissible. Consider now a Bayes rule r with probabilities pij . Is there anybetter rule than r? Suppose that r is better. Then∑

πipii <∑

πipii.

On the other hand,∑

πipii =∑∫

ψiπifi(x)dx

≤∑∫

ψimaxj

{πjfj(x)}dx =∫

maxj

{πjfj(x)}dx.Since r is a Bayes rule, we have

maxj

{πjfj(x)} =∑

ψiπifi(x)

so that finally, the inequality is:∫ ∑ψiπifi(x)dx =

∑πipii ≥

∑πipii

©2004 CRC Press LLC

Page 230: Statistics in Musicology

which contradicts the first inequality. The conclusion is therefore that everyBayes rule is optimal in the sense that it is admissible. If there are no apriori probabilities πi, or more exactly the noninformative prior is used,then this means that the ML-rule is optimal.The second criterion is applicable if a priori probabilities are available:

the probability of correct allocation is

pcorrect =p∑i=1

πipii =p∑i=1

πi

∫ψifi(x)dx (9.9)

A rule is optimal if pcorrect is maximal. In contrast to admissibility, all rulescan be ordered according to “classification correctness”. As before, it canbe shown that the Bayes rule is optimal.Both criteria can be generalized to the case where misclassification is

associated with costs that may differ for different groups.

9.2.3 Case II: Population distribution form known, parameters unknown

Suppose that each fi is known, except for a finite dimensional parame-ter vector θi. Then the rules above can be adopted accordingly, replacingparameters by their estimates. The ML-rule is then: allocate x to Gi, if

fi(x; θi) = maxj=1,...,p

fj(x; θj) (9.10)

The Bayes rule allocates x to G1, if

πifi(x; θi) = maxj=1,...,p

πjfj(x; θj) (9.11)

The rule becomes particularly simple if fi are normal with unknown meansµi and equal covariance matrices Σ1 = Σ2 = ... = Σ. Let xi be the samplemean and Σi the sample covariance matrix for observations from group Gi.Estimating the common covariance matrix Σ by

Σ = (n1Σ1 + n2Σ2 + ...+ npΣp)/(n− p)

where ni is the number of observations from Gi and n = n1 + ...+ np, theML-rule allocates x to Gi, if

(x− µi)tΣ−1(x− µi) = minj=1,...,p

(x− µj)tΣ−1(x − µj) (9.12)

For two groups, we have the linear ML-rule

at(x− 12(x1 + x2)) > 0 (9.13)

where a = Σ−1(x1 − x2), and the corresponding Bayes rule

at(x− 12(x1 + x2)) > log

π2

π1(9.14)

©2004 CRC Press LLC

Page 231: Statistics in Musicology

It should be emphasized here that while a linear discriminant rule is mean-ingful for the normal distribution, this may not be so for other distributions.For instance, if for G1 a one-dimensional random variable X is observedwith a uniform distribution on [−1, 1] and forG2 the variableX is uniformlydistributed on [−3,−2] ∪ [2, 3], then the two groups can be distinguishedperfectly, however not by a linear rule.

9.2.4 Case III: Population distributions completely unknown

If the population distributions fi are completely unknown, then the searchfor reasonable rules is more difficult. In recent literature, some rules basedon nonparametric estimation or suitable projection techniques have beenproposed (see e.g. Friedman 1977, Breiman 1984, Hastie et al. 1994, Polzehl1995, Ripley 1995, Duda et al. 2000, Hand et al. 2001).The simplest, and historically most important, rule is based on Fisher’s

linear discriminant function. Fisher postulated that a linear rule may oftenbe reasonable (see however the remark in Section 9.2.3 why this need notalways be so). He proposed to find a vector a such that the linear functionatx maximizes the ratio between the variability between groups comparedto the variability within the groups. More specifically, define

Xn×p = X

to be the n× p matrix where each row i corresponds to an observed vectorxi = (xi1, ..., xip)t. We denote the columns of X by x(j) (j = 1, ..., p). Therows are assumed to be ordered according to groups, i.e. rows 1 to n1 areobservations from G1, rows n1 +1 through n1 +n2 are from G2 and so on.Moreover, define the matrix

Mn×n =M = I − n−11 · 1t

where I is the identity matrix and 1 = (1, ..., 1)t. We denote the subma-trices of X and M that belong to the different groups by X

(i)nj×p = X(j)

and M(j)nj×nj

= M (j) respectively. The corresponding subvectors of y =(y1, ..., yn)t are denoted by y(j). Then the variability of the vector y = Xa,defined by

SST =n∑i=1

(yi − y)2 = ytMy = atXtMXa (9.15)

can be written as

SST = SSTwithin + SSTbetween (9.16)

where

SSTwithin =p∑j=1

nj∑i=1

(y(j)i − y(j))2 = atWa (9.17)

©2004 CRC Press LLC

Page 232: Statistics in Musicology

and

SSTbetween =p∑j=1

nj(y(j) − y)2 = atBa (9.18)

Here,

W =p∑j=1

njSj =p∑j=1

[X(j)]tM (j)X(j)

is the within groups matrix and

B =p∑j=1

nj(x(j) − x)(x(j) − x)t

the between groups matrix, Sj is the sample covariance matrix of obser-vations xi from group Gj , y = n−1

∑pj=1

∑nj

i=1 y(j)i is the overall mean,

y(j) = n−1j

∑y(j)i the mean in group Gj and x(j) and x are the corre-

sponding (vector) means for x. Fisher’s linear discriminant function (orfirst canonical variate) is the linear function atx where a maximizes theratio

Q(a) =SSTbetweenSSTwithin

=atBa

atWa(9.19)

The solution is given byTheorem 24 Let a be the eigenvector of W−1B that corresponds to thelargest eigenvalue. Then Q(a) is maximal.The classification rule is then: allocate x to Gi, if

|atx− atx(i)| = minj=1,...,p

|atx− atx(j)| (9.20)

If there are only p = 2 groups, then

B =n1n2

n(x(1) − x(2))(x(1) − x(2))t

has rank 1 and the only non-zero eigenvalue is

tr(W−1B) =n1n2

n(x(1) − x(2))tW−1(x(1) − x(2))

with eigenvector a = W−1(x(1)− x(2)). The discriminant rule then becomesthe same as the ML-rule for normal distributions with equal covariancematrices: allocate x to Gi, if

(x(1) − x(2))tW−1(x − 12(x(1) + x(2))) > 0 (9.21)

9.2.5 How good is an empirical discriminant rule?

If the densities fi are not known, then the classification rule as well as theprobabilities pii of correct classification must be estimated from the given

©2004 CRC Press LLC

Page 233: Statistics in Musicology

Figure 9.1 Discriminant analysis combined with time series analysis can be usedto judge purity of intonation (“Elvira” by J.B.).

data. In principle this is easy, since the corresponding estimates can simplybe plugged into the formula for pii. The observed data that are used forestimation are also called “training sample”. A problem with these esti-mates is, however, that the search for the optimal discriminant rule wasdone with the same data. Therefore, p11 will tend to be too optimistic (i.e.too large), unless n is very large. The same is true for any method thatestimates classification probabilities from the training data. A possibilityto avoid this is to partition the data set randomly into a “training” samplethat is used for estimation of the discriminant rule, and a disjoint “val-idation” sample that is used for estimation of classification probabilities.Obviously, this can only be done for large enough data sets. For recentlydeveloped computational methods of validation, such as bootstrap, see e.g.Efron (1979), Lauter (1985), Fukunaga (1990), Hirst (1996), LeBlanc andTibshirani (1996), Davison and Hinkley (1997), Chernick (1999), Good(2001).

9.3 Sp ecific applications in music

9.3.1 Identification of pitch, tone separation, and purity of intonation

Weihs et al. (2001) investigate objective criteria for judging purity of in-tonation of singing. The acoustic data are as described in Chapter 4. Inorder to address the question of how to computationally assess purity ofintonation, a vocal expert classified 132 selected tones of 17 performances(Figure 9.1) of Handel’s “Tochter Zion” into the classes “flat”, “correct”,and “sharp”. The opinion of the expert is assumed to be the truth. Anobjective measure of purity is defined by ∆ = log12(ωobserved) − log12(ωo)

©2004 CRC Press LLC

Page 234: Statistics in Musicology

where ωo is the correct basic frequency, corresponding to the note in thescore and adjusted to the tuning of the accompanying piano, and ωobservedis the actually measured frequency. Maximum likelihood discriminant anal-ysis leads to the following classification rule: the maximal permissible errorin halftones which is accepted in order to classify a tone as “correct” isabout 0.4 halftones below and above the target tone. Note that this ismuch higher than 0.03 halfnotes which is the minimal distance betweenfrequencies a trained ear can distinguish in principle (see Pierce 1992). Ifa note is considered incorrect by an expert, then the estimated probabilityof being nevertheless classified as “correct” by the discriminant rule turnsout to be 0.174. This rather high error rate may be due to several causes.“Purity of intonation” is a phenomenon that probably depends on morethan just the basic frequency. Possible factors are, for instance, amountof vibrato, loudness, pitch, context (e.g. previous and subsequent notes),timbre, etc. Thus, more variables that characterize the sound may have tobe incorporated, in addition to ∆, in order to define a musically meaningfulnotion of “purity of intonation”.

9.3.2 Identification of historic periods

For a composition, consider notes modulo octave, with 0 being set equal tothe most frequent note (which we will also call “basic tone”). The relativefrequencies of each note 0, ..., 11 are denoted by po, ..., p11. We the set x1 =p5. Note that, if 0 is the root of the tonic triad then 5 is the root of thesubdominant. Moreover we define

x2 = E = −n∑i=1

log(pi + 0.001)pi

which is slightly modified measure of entropy. We now describe each com-position by a bivariate observation

x = (p5, E)t.

The question is now whether this very simple 2-dimensional descriptivestatistic can tell us anything about the time when the music was composed.In view of the somewhat naive simplicity of x, the answer is not at allobvious.To simplify the problem, composers are divided into two groups: Group

1 = composers who died before 1800, and Group 2 = composers who diedafter 1800 (or are still alive). Essentially, the two groups correspond tothe partition into “early music to baroque” and “classical till today”. Thecompositions considered here are those given in the star plot example (Sec-tion 2.7.2). In order to be able to check objectively how the procedureworks, only a subset of n = 94 compositions is used for estimation. Apply-ing a linear discriminant rule partitions the plane into two half planes by

©2004 CRC Press LLC

Page 235: Statistics in Musicology

entropy

P(S

ubdo

min

ant)

1.9 2.0 2.1 2.2 2.3 2.4

-2.0

-1

.8

-1.6

-1

.4

-1.2

before 1800after 1800

Fitted discriminant rule and training data used for estimation

Figure 9.2 Linear discriminant analysis of compositions before and after 1800,with the training sample. The data used for the discriminant rule consists ofx = (p 5 , E).

a straight line. Figure 9.2 shows the estimated partitioning line togetherwith the training sample (o = before 1800, x = after 1800). Apparently, thetwo groups can indeed be separated quite well by the estimated straightline. This is quite surprising, given the simplicity of the two variables. Asexpected, however, the partition is not perfect, and it does not seem to bepossible to improve it by more complicated partitioning lines. To assess howwell the rule may indeed classify, we consider 50 other compositions thatwere not used for estimating the discriminant rule. Figure 9.3 shows thatthe rule works well, since almost all observations in the validation sampleare classified correctly. An unusual composition is Bartok’s Bagatelle No.3 which lies far on the left in the “wrong” group.The partitioning can be improved if the time periods of the two groups

are chosen farther apart. This is done in figures 9.3a and b with Group1 = “Early Music to Baroque” and 2 = “Romantic to 20th century”. (Abeautiful example of early music is displayed in Figure 9.6; also see Fig-ures 9.7 and 9.8 for portraits of Brahms and Wagner.) Figure 9.4 showsthe corresponding plot of the partition together with the data (n = 72).Compositions not used in the estimation are shown in Figure 9.5. Again,the rule works well, except for Bartok’s third Bagatelle.

©2004 CRC Press LLC

Page 236: Statistics in Musicology

entropy

P(S

ubdo

min

ant)

1.9 2.0 2.1 2.2 2.3 2.4

-2.0

-1.8

-1.6

-1.4

-1.2

Bartok

before 1800after 1800

Fitted discriminant rule and validation data not used for estimation

Figure 9.3 Linear discriminant analysis of compositions before and after 1800,with the validation sample. The data used for the discriminant rule consists ofx = (p5, E).

entropy

P(S

ubdo

min

ant)

1.8 2.0 2.2 2.4

-2.0

-1.8

-1.6

-1.4

-1.2

-1.0

Early & BaroqueRomantic & 20th

Fitted discriminant rule and data used for estimation

Figure 9.4 Linear discriminant analysis of “Early Music to Baroque” and “Ro-mantic to 20th Century”. The points (”o” and ”×”) belong to the training sample.The data used for the discriminant rule consists of x = (p5, E).

©2004 CRC Press LLC

Page 237: Statistics in Musicology

entropy

P(S

ubdo

min

ant)

1.8 2.0 2.2 2.4

-2.2

-1.8

-1.4

-1.0

Bartok

Early & BaroqueRomantic & 20th

Fitted discriminant rule and validation data not used for estimation

Figure 9.5 Linear discriminant analysis of “Early Music to Baroque” and “Ro-mantic to 20th century”. The points (”o” and ”×”) belong to the validation sam-ple. The data used for the discriminant rule consists of x = (p5, E).

Figure 9.6 Graduale written for an Augustinian monastery of the diocese Kon-stanz, 13th century. (Courtesy of Zentralbibliothek Zurich.) (Color figures followpage 152.)

©2004 CRC Press LLC

Page 238: Statistics in Musicology

Figure 9.7 Johannes Brahms (1833-1897). (Photograph by Maria Fellinger, cour-tesy of Zentralbibliothek Zurich.)

Figure 9.8 Richard Wagner (1813-1883). (Engraving by J. Bankel after a paintingby C. Jager, courtesy of Zentralbibliothek Zurich.)

©2004 CRC Press LLC

Page 239: Statistics in Musicology

CHAPTER 10

Cluster analysis

10.1 Musical motivation

In discriminant analysis, an optimal allocation rule between different groupsis estimated from a training sample. The type and number of groups areknown. In some situations, however, it is neither known whether the datacan be divided into homogeneous subgroups nor how many subgroups theremay be. How to find such clusters in previously ungrouped data is the pur-pose of cluster analysis. In music, one may for instance be interested in howfar compositions or performances can be grouped into clusters representingdifferent “styles”. In this chapter, a brief introduction to basic principlesof statistical cluster analysis is given. For an extended account of clusteranalysis see e.g. Jardine and Sibson (1971), Anderberg (1973), Hartigan(1978), Mardia et al. (1979), Seber (1984), Blashfield et al. (1985), Hand(1986), Fukunaga (1990), Arabie et al. (1996), Gordon (1999), Hoppner etal. (1999), Everitt et al. (2001), Jajuga et al. (2002), Webb (2002).

10.2 Basic principles

10.2.1 Maximum likelihood classification

Suppose that observations x1, ..., xn ∈ Rk are realizations of n independentrandom variables Xi (i = 1, ..., n). Assume further that each random vari-able comes from one of p possible groups such that ifXi comes from group j,then it is distributed according to a probability density f(x; θj). In contrastto discriminant analysis, it is not observed which groups xi (i = 1, ..., n)belong to. Each observation xi is thus associated with an unobserved pa-rameter (or label) ηi specifying group membership. We may simply defineηi = j if xi belongs to group j. Denote by η = (η1, ..., ηn)t the vector oflabels and, for each j = 1, ..., p, let Aj = {xi : 1 ≤ i ≤ n, ηi = j} bethe unknown set of observations that belong group j. Then the likelihoodfunction of the observed data is

L(x1, ..., xn; θ1, ..., θp, η1, ..., ηn) =p∏j=1

{∏xi∈Aj

f(xi; θj)} (10.1)

Maximizing L with respect to the unknown parameters θ1, ..., θp and η1,

..., ηn, we obtain ML-estimates θ1, ..., θp, η1, ..., ηn and estimated sets

©2004 CRC Press LLC

Page 240: Statistics in Musicology

A1, ..., Ap. Denoting by m the dimension of θj , the number of estimatedparameters is p · m + n. This is larger than the number of observations.It can therefore not be expected that all parameters are estimated consis-tently. Nevertheless, the ML-estimate provides a classification rule due tothe following property: suppose that we change one of the Ajs by removingan observation xio from Aj and putting it into another set Al (l �=j). Thenthe likelihood can at most become smaller. The new likelihood is obtainedfrom the old one by dividing by f(xio ; θj) and multiplying by f(xio ; θl).We therefore have the following property

L(x1, ..., xn; θ1, ..., θp, η1, ..., ηn)f(xio ; θl)f(xio ; θj)

≤ L(x1, ..., xn; θ1, ..., θp, η1, ..., ηn)

(10.2)or, dividing by L (assuming that it is not zero),

f(x; θj) ≥ f(x; θl) for x ∈ Aj (10.3)

This is identical with the ML-allocation rule in discriminant analysis. Theonly, but essential, difference here is that η is unknown, i.e. our sample(“training data”) gives us only information about the distribution of Xbut not about η. This makes the task much more difficult. In particular,since the number of unknown parameters is too large in general, maxi-mum likelihood clustering can not only be computationally difficult butits asymptotic performance may not stabilize sufficiently. In special cases,however, a simple method can be obtained. Suppose, for instance, that thedistributions in the groups are multivariate normal with means µj and co-variance matrices Σj . Then the ML-estimates of these parameters, given η,are the group sample means

xj =1

nj(η)

∑i∈Aj(η)

xi

and group sample covariance matrices

Σj(η) =1

nj(η)

∑i∈Aj(η)

(xi − xj(η))(xi − xj(η))t

respectively. The log-likelihood function then reduces to a constant minus12

∑pj=1 nj log |Σj |. Maximization with respect to η leads to the estimate

η = argminη

h(η) (10.4)

where

h(η) =p∏j=1

|Σj(η)|nj(η) (10.5)

Computationally this means that the function h(η) is evaluated for allgroupings η of the observations x1, ..., xn, and the estimate is the grouping

©2004 CRC Press LLC

Page 241: Statistics in Musicology

that minimizes h(η). Clearly, this is a computationally demanding task. Asimpler rule is obtained if we assume that all covariance matrices are equalto a common covariance matrix

∑. Then

η = argminη

|Σ| = argminη

n−1

p∑j=1

(njΣj) = argminη

p∑j=1

(njΣj) (10.6)

Even in this simplified form, finding the best clustering is computationallydemanding. For instance, if data have to be divided into two groups, thenthe number of possible assignments for which

∑2j=1(njΣj) may differ is

equal to 2n−1. In addition, if the number of groups is not known a priori,then a suitable, and usually computationally costly, method for estimat-ing p must be applied. From a principle point of view it should also benoted that if normal distributions or any other distributions with overlap-ping domains are assumed, then there are no perfect clusters. Even if thedistributions were known, an observation x can be from any group withfi(x) > 0, with positive probability, so that one can never be absolutelysure where it belongs.A variation of ML-clustering is obtained if the groups themselves are

associated with probabilities. Let πj be the probability that a randomlysampled observation comes from group j. In analogy to the argumentsabove, maximization of the likelihood with respect to all parameters in-cluding πj (j = 1, ..., p) leads to a Bayesian allocation rule with πj as priordistribution.

10.2.2 Hierarchical clustering

ML-clustering yields a partition of observations into p groups. Sometimesit is desirable to obtain a sequence of clusters, e.g. starting with two maingroups and then subdividing these into increasingly homogeneous clusters.This is particularly suitable for data where a hierarchy is expected - suchas, for instance, in music. Generally speaking, a hierarchical method hasthe following property: a partitioning into p+ 1 clusters consists of

• two clusters whose union is equal to one of the clusters from the parti-tioning into p groups

• p − 1 clusters that are identical with p − 1 clusters of the partitioninginto p groups.

In a first step, data are transformed into a matrix D = (dij)i,j=1,...,n ofdistances or a matrix S = (sij)i,j=1,...,n of similarities. The definition ofdistance and similarity used in cluster analysis is more general than theusual definition of a metric:

Definition 54 Let X be an arbitrary set and d : X × X → R a real valuedfunction such that for all x, y ∈ X

©2004 CRC Press LLC

Page 242: Statistics in Musicology

D1. d(x, y) = d(y, x)D2. d(x, y) ≥ 0D3. d(x, x) = 0

Then d is called a distance. If in addition we also haveD4. d(x, y) = 0 ⇔ x = y

D5. d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality),then d is a metric.A measure of similarity is usually assumed to have the following properties:Defini ti on 55 Let X be an arbitrary set and s : X × X → R a real valuedfunction such that for all x, y ∈ XS1. s(x, y) = s(y, x)S2. s(x, y) > 0S3. s(x, y) increases with increasing “similarity”.

Then s is called a measure of similarity.Axiom S3 is of course somewhat subjective, since it depends on what ismeant exactly by “similarity”. Table 10.1 gives examples of distances andmeasures of similarity.Suppose now that, for an observed data set x1, ..., xn, we can define a

distance matrix D = (dij)i,j=1,...,n where dij denotes the distance betweenvectors xi and xj . A hierarchical clustering algorithm tries to group thedata into a hierarchy of clusters in such a way that the distances withinthese clusters are generally much smaller than those between the clusters.Numerous algorithms are available in the literature. The reason for thevariety of solutions is that in general the result depends on various “freechoices”, such as the sequence in which clusters are built or the definitionof distance between clusters. For illustration, we give the definition of thecomplete linkage (or furthest neighbor) algorithm:1. Set a threshold do.2. Start with the initial clusters A(o)

1 = {x1}, ..., A(o)n = {xn} and set i = 1.

The distances between the clusters are defined by d(o)jl = d(A(o)

j , A(o)l ) =

d(xj , xl). This gives the n× n distance matrix D(o) = (d(o)jl )j,l=1,...,n.

3. Join the two clusters for which the distance d(i−1)jl is minimal, thus ob-

taining new clusters A(i)1 , ..., A

(i)n−i.

4. Calculate the new distances between clusters by

d(i)jl = d(A(i)

j , A(i)l ) = max

x∈A(i)j ,y∈A(i)

l

d(x, y) (10.7)

and the corresponding (n−i)×(n−i) distance matrixD(i) with elementsd(i)jl (j, l = 1, ..., n− i).

©2004 CRC Press LLC

Page 243: Statistics in Musicology

Table 10.1 Some measures of distance and similarity between x = (x1, ..., xk)t,y = (y1, ..., yk)t ∈ Rk. For some of the distances, it is assumed that a data set ofobservations in Rk is available to calculate sample variances s2

j (j = 1, ..., k) anda k × k sample covariance matrix S.

Name Definition Comments

Euclidian distance d(x, y) =√∑k

i=1(xi − yi)2 Usual distance

in Rk

Pearson distance d(x, y) =√∑k

i=1(xi − yi)2/s2j Standardized

Euclidian

Mahalanobis distance d(x, y) =√(x − y)tS−1(x − y) Standardized

Euclidian

Manhattan metric d(x, y) =∑k

i=1 wi|xi − yi| Less sensitive(wi ≥ 0) to outliers

Minkowski metric d(x, y) =(∑k

i=1 wi|xi − yi|λ)1/λ

For λ = 1 :

(λ ≥ 1) Manhattan

Bhattacharyya distance d(x, y) =(∑k

i=1(√

xi −√yi)2

)1/2For xi, yi ≥ 0

(example:proportions)

Binary similarity s(x, y) = k−1∑

xiyi Suitable forxi = 0, 1

Simple matching coefficient s(x, y) = k−1∑

ai, Suitable forai = xiyi + (1 − xi)(1 − yi) for xi = 0, 1

Gower’s similarity s(x, y) = 1− k−1∑

wi|xi − yj |, Suitable ifcoefficient wi = 1 if some xi

xi qualitative, qualitative,wi = 1/Ri if some xi

quantitative quantitative(with Ri = range ofith coordinate)

5. Ifmax

j,l=1,...,n−id(i)jl ≤ do (10.8)

then stop. Otherwise, set i = i+ 1 and go to step 3.Note in particular that for the final clusters, the maximal distance withineach cluster is at most do. As a result, the final clusters tend to be very“compact”. A related method is the so-called nearest neighbor single link-age algorithm. It is identical with the above except that distance betweenclusters is defined as the minimal distance between points in the two clus-ters. This can lead to so-called “chaining” in the form of elongated clusters.

©2004 CRC Press LLC

Page 244: Statistics in Musicology

For other algorithms and further properties see the references given at thebeginning of this chapter, and references therein.

10.2.3 HISMOOTH and HIWAVE clustering

HISMOOTH and HIWAVE models, as defined in Chapter 5, can be usedto extract dominating features of a time series y(t) that are related toan explanatory series x(t). Suppose that we have several y-series, yj(t)(j = 1, ..., N) that share the same explanatory series x(t). An interestingquestion is then in how far features related to x(t) are similar, and whichseries have more in common than others. One way to answer the questionconsists of the following clustering algorithm:1. For each series yj(t), fit a HISMOOTH or HIWAVE model, thus obtain-

ing a decomposition

yj(t) = µj(t, xt) + ej(t)

where µj is the estimated expected value of yj given x(t).2. Perform a cluster analysis of the fitted curves µj(t, xt).

10.3 Sp ecific applications in music

10.3.1 Distribution of notes

Consider the distribution pj (j = 0, 1, ..., 11) of notes modulo as definedfor the star plots in Chapter 2. Can the visual impression of star plotsin Figure 2.31 be confirmed by cluster analysis? We consider the trans-formed data vectors ζ = (ζ1, ..., ζ11)t, with ζj = log(pj/(1 − pj)), for thefollowing compositions: 1) Anonymus: Saltarello (13th century); Saltarello(14th century); Troto (13th century); Alle psalite (13th century); 2) A. dela Halle (1235?-1287): Or est Bayard en la pature, hure!; 3) J. Ockeghem(1425-1495): Canon epidiatesseron; 4) J. Arcadelt (1505-1568): Ave Maria;La Ingratitud; Io dico fra noi; 5) W. Byrd (1543-1623): Ave Verum Corpus;Alman; The Queen’s Alman; 6) J. Dowland (1562-1626): The Frog Galliard;The King of Denmark’s Galliard; Come again; 7) H.L. Hassler (1564-1612):Galliarda; Kyrie from Missa Secunda; Sanctus et Benedictus from MissaSecunda; 8) Palestrina (1525-1594): Jesu! Rex admirablis; O bone Jesu;Pueri hebraeorum; 9) J.H. Schein (1586-1630): Banchetto musicale; 10) J.S.Bach (1685-1750): Preludes and Fugues 1-24 from “Das WohltemperierteKlavier”; 11) J. Haydn (1732-1809): Sonata op. 34/3 (Figure 10.3); 12)W.A. Mozart (1756-1791): Sonata KV 545 (2nd Mv.); Sonata KV 281 (2ndMv.); Sonata KV 332 (2nd Mv.); Sonata KV 333 (2nd Mv); 13) C. De-bussy (1862-1918): Claire de lune; Arabesque 1; Reflections dans l’eau; 14)A. Schonberg (1874-1951): op. 19/2 (Figure 10.4); 15) A. Webern (1883-1945): Orchesterstuck op. 6, No. 6; 16) Bartok (1881-1945): Bagatelles No.

©2004 CRC Press LLC

Page 245: Statistics in Musicology

AN

ON

YM

US

AN

ON

YM

US

AN

ON

YM

US A

NO

NY

MU

SH

ALL

E

OC

KE

GH

EM A

RC

AD

ELT

AR

CA

DE

LTA

RC

AD

ELT

BY

RD

BY

RDB

YR

D

DO

WLA

ND

DO

WLA

ND

HA

SS

LER

HA

SS

LER

HA

SS

LER

PA

LES

TRIN

A

PA

LES

TRIN

A

PA

LES

TRIN

A

SC

HE

IN

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

HA

YD

N

MO

ZAR

T

MO

ZAR

T

MO

ZAR

T

MO

ZAR

T

MO

ZAR

T

BA

RTO

K

BA

RTO

K

BA

RTO

K

BA

RTO

K

ME

SS

IAE

N

SC

HO

EN

BE

RG

TAK

EM

ITS

U

WE

BE

RN

DE

BU

SS

Y

DE

BU

SS

Y

DE

BU

SS

Y

51

01

52

02

53

0Distribution of notes modulo 12 - complete linkage

Figure 10.1 Complete linkage clustering of log-odds-ratios of note-frequencies.

AN

ON

YM

US

AN

ON

YM

US

AN

ON

YM

US

AN

ON

YM

US

HA

LLE

OC

KE

GH

EM

AR

CA

DE

LT

AR

CA

DE

LT

AR

CA

DE

LT

BY

RD

BY

RD B

YR

D

DO

WLA

ND

DO

WLA

ND

HA

SS

LER

HA

SS

LER

HA

SS

LER

PA

LES

TRIN

AP

ALE

STR

INA

PA

LES

TRIN

A

SC

HE

IN

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

BA

CH

HA

YD

N

MO

ZAR

T

MO

ZAR

TM

OZA

RT

MO

ZAR

T

MO

ZAR

T

BA

RTO

K

BA

RTO

K

BA

RTO

K

BA

RTO

K

ME

SS

IAE

N

SC

HO

EN

BE

RG

TAK

EM

ITS

U

WE

BE

RN

DE

BU

SS

Y

DE

BU

SS

Y

DE

BU

SS

Y

46

810

1214

Distribution of notes modulo 12 - single linkage

Figure 10.2 Single linkage clustering of log-odds-ratios of note-frequencies.

©2004 CRC Press LLC

Page 246: Statistics in Musicology

Figure 10.3 Joseph Haydn (1732-1809). (Title page of a biography published bythe Allgemeine Musik-Gesellschaft Zurich, 1830; courtesy of ZentralbibliothekZurich.)

1-3; Piano Sonata (2nd Mv.); 17) O. Messiaen (1908-1992): Vingts regardsde Jesu No. 3; 18) T. Takemitsu (1930-1996): Rain tree sketch No. 1.Figure 10.1 shows the result of complete linkage clustering of the vectors

(ζ1, ..., ζ11)t, based on the Euclidian and do = 5. The most striking fea-ture is the clear separation of “early music” from the rest. Moreover, the20th century composers considered here are in a separate cluster, exceptfor Bartok’s Bagatelle No. 3 (and Debussy, who may be considered as be-longing to the 19th and 20th centuries). In contrast, clusters provided by asingle linkage algorithm are less easy to interpret. Figure 10.2 illustrates atypical result of this method namely long narrow clusters where the maxi-mal distance within a cluster can be quite large. In our example this does

©2004 CRC Press LLC

Page 247: Statistics in Musicology

Figure 10.4 Klavierstuck op. 19, No. 2 by Arnold Schonberg. (Facsimile; used bypermission of Belmont Music Publishers.)

©2004 CRC Press LLC

Page 248: Statistics in Musicology

Bac

h: C

ello

Sui

te I/

1

Bac

h: C

ello

Sui

te II

/1

Bac

h: C

ello

Sui

te II

I/1

Bac

h: C

ello

Sui

te IV

/1

Bac

h: C

ello

Sui

te V

/1

Bac

h: C

ello

Sui

te V

I/1

Bac

h: P

r.1/W

K I

Bac

h: F

.1/W

K I

Bac

h: P

r.8/W

K I

Bac

h: F

.8/W

K I

56

78

Clusters of entropies - complete linkage

Figure 10.5 Complete linkage clustering of entropies.

not seem appropriate, since, due to the “organic” historic development ofmusic, the effect of chaining is likely to be particularly pronounced.

10.3.2 Entropies

Consider entropies as defined in Chapter 3. More specifically, we definefor each composition a vector y = (E1, ..., E10)t. After standardization ofeach coordinate, cluster analysis is applied the following compositions byJ.S. Bach: Cello Suites No. I to VI (1st movement from each); Preludes andFugues No. 1 and 8 from “Das Wohltemperierte Klavier” (each separately).The complete linkage algorithm leads to a clear separation of the CelloSuites from “Das Wohltemperierte Klavier” displayed in Figure 10.5.

10.3.3 Tempo curves

One of the obvious questions with respect to the tempo curves in Figure2.3 is whether one can find clusters of similar performances. Applying com-plete linkage cluster analysis (with the euclidian distance) to the raw datayields the clusters in Figure 10.6. Cortot and Horowitz appear to have veryindividual styles, since they build distinct clusters on their own. It shouldbe noted, however, that this does not imply that other pianists do not havetheir own styles. Cortot and Horowitz simply happen to be the lucky ones

©2004 CRC Press LLC

Page 249: Statistics in Musicology

AR

GE

RIC

H

AR

RA

UA

SK

EN

AZ

E

BR

EN

DE

L

BU

NIN

CA

PO

VA

CO

RT

OT

1C

OR

TO

T2CO

RT

OT

3

CU

RZ

ON

DA

VIE

S

DE

MU

S

ES

CH

EN

BA

CH

GIA

NO

LI

HO

RO

WIT

Z1

HO

RO

WIT

Z2

HO

RO

WIT

Z3

KA

TS

AR

IS

KLIE

N

KR

US

T

KU

BA

LE

K

MO

ISE

IWIT

SC

H

NE

YN

OV

AE

SOR

TIZ

SC

HN

AB

EL

SH

ELLE

Y

ZA

K

6

8

10

12

14

Clusters of tempo curves - complete linkage

Figure 10.6 Complete linkage clustering of tempo.

who are represented more than once in the sample, so that the consistencyof their performances can be checked empirically. Figure 10.6 also showsthat Cortot is somewhat of an “outlier”, since his cluster separates fromall other pianists at the top level.

10.3.4 Tempo curves and melodic structure

Cluster analysis alone does not provide any further explanation about themeaning of observed clusters. In particular, we do not know which musi-cally meaningful characteristics determine the clustering of tempo curves.In contrast, cluster analysis based on HISMOOTH or HIWAVE modelsprovides a way to gain more insight. The fitted HISMOOTH curves in Fig-ures 5.9a through d extract essential features that make comparisons easier.The estimated bandwidths can be interpreted as a measure of how muchemphasis a pianist puts on global and local features respectively. Figure10.7 shows clusters based on the fitted HISMOOTH curves. In contrast tothe original data, complete and single linkage turn out to yield almost thesame clusters. Thus, applying the HISMOOTH fit first leads to a stabiliza-tion of results. From Figure 10.7, we may identify about six main clusters,namely:

• A: KRUST, KATSARIS, SCHNABEL;

©2004 CRC Press LLC

Page 250: Statistics in Musicology

AR

GE

RIC

H

AR

RA

U

AS

KE

NA

ZE

BR

EN

DE

L

BU

NIN

CA

PO

VA

CO

RTO

T1

CO

RTO

T2

CO

RTO

T3

CU

RZO

N

DA

VIE

S

DE

MU

S

ES

CH

EN

BA

CH

GIA

NO

LI

HO

RO

WIT

Z1H

OR

OW

ITZ2

HO

RO

WIT

Z3

KA

TSA

RIS

KLI

EN

KR

US

T

KU

BA

LEKM

OIS

EIW

ITS

CH

NE

Y

NO

VA

ES

OR

TIZ

SC

HN

AB

EL

SH

ELL

EY

ZAK

45

67

89

Clusters of HISMOOTH fits - complete linkage

Figure 10.7 Complete linkage clustering of HISMOOTH-fits to tempo curves.

• B: MOISEIWITSCH, NOVAES, ORTIZ;• C: DEMUS, CORTOT1, CORTOT2, CORTOT3, ARGERICH, SHEL-LEY, CAPOVA;

• D: ARRAU, BUNIN, KUBALEK, CURZON, GIANOLI;• E: ASKENAZE, DAVIES;• F: HOROWITZ1, HOROWITZ2, HOROWITZ3, ZAK, ESCHENBACH,NEY, KLIEN, BRENDEL.

This is related to grouping of the vector of estimated bandwidths, (b1, b2, b3)t ∈R3

+. In figure 10.8, the x- and y-coordinates correspond to b1 and b2 respec-tively, and the radius of a circle is proportional to b3. The letters A throughF identify locations where one or more observation from that cluster occurs.The pictures show that only a few selected values of b1 and b2 are selected.Particularly striking are the large bandwidths for clusters A and B. Appar-ently, these pianists emphasize mostly larger structures of the composition.Also note that the clusters do not separate equally well in each projec-tion. Apart from clusters A and B, one cannot “order” the performancesin terms of large versus small bandwidth. Overall, one may conclude thatHISMOOTH-clustering together with analytic indicator functions providesa better understanding of essential characteristics of musical performance(Figure 10.9).

©2004 CRC Press LLC

Page 251: Statistics in Musicology

0.5 1.0 1.5 2.0 2.5 3.0 3.5

01

23

AAA AAA AAA AAA AAA AAA AAA AAA AAA A

BBB BBB BBB BBB BBB BBB BBB BBB BBB B

C

CCC

CCCC

CCC

CCCC

CCC

CCCC

CCC

CCCD DD

DD

D DD

DD

D DD

DD

D DD

DD

D DD

DD

D DDE

EE

EE

EE

EE

EE

EE

EE

EE

EE

EE

EE

EE

EE

E

FFF

F F FFF

FFF

F F FFF

FFF

F F FFF

FFF

F

Figure 10.8 Symbol plot of HISMOOTH bandwidths for tempo curves. The radiusof each circle is proportional to a constant plus log b3; the horizontal and verticalaxes are equal to b1 and b2 respectively. The letters A–F indicate where at leastone observation from the corresponding cluster occurs.

Figure 10.9 Maurizio Pollini (*1942). (Courtesy of Philippe Gontier, Paris.)

©2004 CRC Press LLC

Page 252: Statistics in Musicology

CHAPTER 11

Multidimensional scaling

11.1 Musical motivation

In some situations data consist of distances only. These distances are notnecessarily euclidian so that they do not necessarily correspond to a config-uration of points in a euclidian space. The question addressed by multidi-mensional scaling (MDS) is in how far one may nevertheless find points ina hopefully low-dimensional euclidian space that have exactly or approx-imately the observed distances. The procedure is mainly an exploratorytool that helps to find structure in distance data. We give a brief intro-duction to the basic principles of MDS. For a detailed discussion and anextended bibliography see, for instance, Kruskal and Wish (1978), Cox andCox (1994), Everitt and Rabe-Hesketh (1997), Borg and Groenen (1997),Schiffman (1997); also see textbooks on multivariate statistics, such as theones given in the previous chapters. For the origins of MDS and earlyreferences see Young and Householder (1941), Guttman (1954), Shepard(1962a,b), Kruskal (1964a,b), Ramsay (1977).

11.2 Basic principles

11.2.1 Basic definitions

In MDS, any symmetric n× n matrix D = (dij)i,j=1,...,n with dij ≥ 0 anddii = 0 is called a distance matrix. Note that this corresponds to the axiomsD1, D2, and D3 in the previous chapter. If instead of distances, a similaritymatrix S = (sij)i,j=1,....,n is given, then one can define a correspondingdistance matrix by a suitable transformation. One possible transformationis, for instance,

dij =√sii − 2sij + sjj (11.1)

The question addressed by metric MDS can be formulated as follows: givenan n × n distance matrix D, can one find a dimension k and n pointsx1, ..., xn in Rk such that these points have a distance matrix D with Dapproximately, or even exactly, equal to D? Clearly one prefers low dimen-sions (k = 2 or 3, if possible), since it is then easy to display the pointsgraphically. On the other hand, the dimension cannot be too low in orderto obtain a good approximation of D, and hence a realistic picture of struc-tures in the data. As an alternative to metric MDS, one may also consider

©2004 CRC Press LLC

Page 253: Statistics in Musicology

non-metric methods where one tries to find points in a euclidian space suchthat the ranking of the distances remains the same, whereas their nominalvalues may differ.

11.2.2 Metric MDS

In the ideal case, the metric solution constructs n points x1, ..., xn ∈ Rk

for some k such that their euclidian distance matrix D, with elementsdij = (xi − xj)t(xi − xj), is exactly equal to the original distance matrixD. If this is possible, then D is called euclidian. The condition under whichthis is possible is as follows:

Theorem 25 D = Dn×n = (dij)i,j=1,...,n is euclidian if and only if thematrix

B = Bn×n = MAM

is positive semidefinite, where M = (I − n−111t), I = In×n is the identitymatrix, 1 = (1, ..., 1)t and A = An×n has elements

aij = −12d2ij (i, j = 1, ..., n).

The reason for positive semidefiniteness of B is that if D is indeed a eu-clidian matrix corresponding to points x1, ..., xn ∈ Rk, then

bij = (xi − x)t(xj − x) (11.2)

so that B defines a “centered” scalar product for these points. In matrixform we have B = (MX)(MX)t where the n rows of Xn×k correspond tothe vectors xi (i = 1, ..., n). Since for any matrix C, the matrices CtC andCCt are positive semidefinite, so is B.The construction of the points x1, ..., xn given D = Dn×n (or Bn×n ≥ 0)

is done as follows: suppose that B is of rank k ≤ n. Since B is a symmetricmatrix, we have the spectral decomposition

B = CΛCt = ZZt (11.3)

where Λ is the n×n diagonal matrix with eigenvalues λ1 ≥ λ2 ≥...≥ λk > 0and λj = 0 (j > k) in the diagonal, and Z = Zn×n = (zij)i,j=1,...,n then× n matrix with the first k columns z(j) (j = 1, ..., k) equal to the first keigenvectors. Then

xi = (zi1, ..., zik)t (i = 1, ..., n) (11.4)

of Z are points in Rk with distance matrix D.In practice, the following difficulties can occur: 1. D is euclidian, but k is

too large to be of any use (after all the purpose is to obtain an interpretablepicture of the data); 2. D is not euclidian with a) all λi positive, or, b) someλi negative. Because of these problems, one often uses a rough approxima-

©2004 CRC Press LLC

Page 254: Statistics in Musicology

tion of D, based on a small number of eigenvectors that correspond topositive eigenvalues.Finally, note that if instead of distances, similarities are given and the

similarity matrix S is positive semidefinite, then S can be transformed intoa euclidian distance matrix by defining

dij =√sii − 2sij + sjj (11.5)

11.2.3 Non-metric MDS

For qualitative data, or generally observations in non-metric spaces, dis-tances can only be interpreted in terms of ranking. For instance, the sub-jective judgement of an audience may be that a composition by Webern isslightly more “difficult” thanWagner, but much more difficult than Mozart,thus defining a larger distance between Webern and Mozart than Webernand Wagner. It may, however, not be possible to express distances betweenthe compositions by numbers that could be interpreted directly. In suchcases,D is often called a dissimilarity matrix rather than a distance matrix.Since only the relative size of distances is meaningful, various computation-ally demanding algorithmic methods for defining points in a euclidian spacesuch that the ranking of the distances remains the same have been devel-oped in the literature (e.g. Shepard 1962a,b, Kruskal 1964a,b, Guttman1968, Lingoes and Roskam 1973).

11.2.4 Chronological ordering

Suppose a distance matrix D (or a similarity matrix S) is given and onewould like to find out whether there is a natural ordering of the obser-vational units. For instance, a listener may assign a distance matrix be-tween various musical pieces without knowing anything about these piecesa priori. The question then may be whether the listener’s distance matrixcorresponds approximately to the sequence in time when the pieces werecomposed. This problem is also called seriation. MDS provides a possi-ble solution in the following way: if the distances expressed the temporal(or any other) sequence exactly, then the configuration of points found byMDS would be one-dimensional. In the more realistic case that distancesare partially due to the temporal sequence, the points in Rk should bescattered around a one-dimensional, not necessarily straight, line in Rk. Inthe simplest case, this may already be visible in a two-dimensional plot.

©2004 CRC Press LLC

Page 255: Statistics in Musicology

11.3 Sp ecific applications in music

11.3.1 Seriation by simple descriptive statistics

Suppose we would like to guess which time a composition is from, withoutlistening to the music but instead using an algorithm. There is a largeamount of music theory that can be used to determine the time when acomposition was written. One may wonder, however, whether there maybe a very simple computational way of guessing.Consider, for instance, the following frequencies: xi = pi−1 (i = 1, ..., 12)

are the relative frequencies of notes modulo 12 centered around the centraltone, as defined in section 9.3.2. Moreover, set x13 equal to the relativefrequency of a sequence of four notes following the sequence of interval steps3, 3 and 3. This corresponds to an arpeggio of the diminished seventh chord.Thus, we consider a vector x = (x1, ..., x13)t with coordinates correspondingto proportions. An appropriate measure of distance between proportionsis the Bhattacharyya distance (Bhattacharyya 1946b) given in Table 10.1namely

d(x, y) =

(k∑i=1

(√xi −√

yi)2)1/2

.

This is not a euclidian distance so that it is not a priori clear whether asuitable representation of the observations in a euclidian space is possible.MDS with k∗ = 2 yields the points in Figure 11.1. Three time periods

are distinguished by using different symbols for the points. The periodsare defined in a very simple way, namely by date of birth of the composer:a) before 1720 (“early to baroque”; see e.g. Figure 11.3); b) 1720-1880(“classical to romantic”); and c) 1880 or later (“20th century”). The con-figuration of the respective points does show an “effect” of time. The threetime periods can be associated with regional clusters though the regionsoverlap. An outlier from the middle category is Schoenberg. This is due tothe crude definition of the time periods: Schoenberg (in particular his op.19/2) clearly belongs the 20th century he just happens to be born a littlebit too early (1874), and is therefore classified as “classical to romantic”.The dependence between time period and second MDS-coordinate can alsobe seen by comparing boxplots (Figure 11.2).

11.3.2 Perception and music psychology

MDS is frequently used to analyze data that consist of subjective distancesbetween musical sounds (e.g. with respect to pitch or timbre) or composi-tions obtained in controlled experiments. Typical examples are Grey andGordon (1978), Gromko (1993), Ueda and Ohgushi (1987), Wedin (1972),Wedin and Goude (1972), Markuse and Schneider (1995). Since it is notknown in how far the cognitive “metric” may correspond approximately to

©2004 CRC Press LLC

Page 256: Statistics in Musicology

x1

x2

-0.4 -0.2 0.0 0.2 0.4

-0.4

-0.2

0.0

0.2

0.4

0.6 before 1720

1720-18801880 or later

Schoenberg

Figure 11.1 Two-dimensional multidimensional scaling of compositions rangingfrom the 13th to the 20th century, based on frequencies of intervals and intervalsequences.

a euclidian distance, MDS is a useful method to investigate this question,to simplify high-dimensional distance data and possibly find interestingstructures. Grey and Gordon consider perceptual effects of timbres char-acterized by spectra. For a related study see Wedin and Goude (1972).Gromko (1993) carries out an MDS analysis to study perceptual differ-ences between expert and novice music listeners. Ueda and Ohgushi (1987)study perceptual components of pitch and use MDS to obtain a spatialrepresentation of pitch.

©2004 CRC Press LLC

Page 257: Statistics in Musicology

-0.4

-0.2

0.0

0.2

0.4

0.6

birth before 1720 1720-1880 1880 and later

Figure 11.2 Boxplots of second MDS-component where compositions are classifiedaccording to three time periods.

Figure 11.3 Fragment of a graduale from the 14th century. (Courtesy of Zentral-bibliothek Zurich.)

©2004 CRC Press LLC

Page 258: Statistics in Musicology

Figure 11.4 Muzio Clementi (1752-1832). (Lithography by H. Bodmer, courtesyof Zentralbibliothek Zurich.)

Figure 11.5 Freddy (by J.B.) and Johannes Brahms (1833-1897) going for adrink. (Caricature from a contemporary newspaper; courtesy of ZentralbibliothekZurich.)

©2004 CRC Press LLC

Page 259: Statistics in Musicology

List of figures

Figure 1.1: Quantitative analysis of music helps to understand creativeprocesses. (Pierre Boulez, photograph courtesy of Philippe Gontier, Paris;and “Jim” by J.B.)

Figure 1.2: J.S. Bach (1685-1750). (Engraving by L. Sichling after a paint-ing by Elias Gottlob Haussmann, 1746; courtesy of ZentralbibliothekZurich.)

Figure 1.3: Ludwig van Beethoven (1770-1827). (Drawing by E. Durckafter a painting by J.K. Stieler, 1819; courtesy of ZentralbibliothekZurich.)

Figure 1.4: Anton Webern (1883-1945). (Courtesy of Osterreichische PostAG.)

Figure 1.5: Gottfried Wilhelm Leibniz (1646-1716). (Courtesy of DeutschePost AG and Elisabeth von Janota-Bzowski.)

Figure 1.6: W.A. Mozart (1759-1791) (authorship uncertain) – Spiegel-Duett.

Figure 1.7: Wolfgang Amadeus Mozart (1756-1791). (Engraving by F.Muller after a painting by J.W. Schmidt; courtesy of ZentralbibliothekZurich.)

Figure 1.8: The torus of thirds Z3 + Z4.

Figure 1.9: Arnold Schonberg – Sketch for the piano concert op. 42 – noteswith tone row and its inversions and transpositions. (Used by permissionof Belmont Music Publishers.)

Figure 1.10: Notes of “Air” by Henry Purcell. (For better visibility, onlya small selection of related “motifs” is marked.)

Figure 1.11: Notes of Fugue No. 1 (first half) from “Das WohltemperierteKlavier” by J.S. Bach. (For better visibility, only a small selection ofrelated “motifs” is marked.)

Figure 1.12: Notes of op. 68, No. 2 from “Album fur die Jugend” byRobert Schumann. (For better visibility, only a small selection of related“motifs” is marked.)

Figure 1.13: A miraculous transformation caused by high exposure toWagner operas. (Caricature from a 19th century newspaper; courtesy ofZentralbibliothek Zurich.)

©2004 CRC Press LLC

Page 260: Statistics in Musicology

Figure 1.14: Graphical representation of pitch and onset time in Z271 to-

gether with instrumentation of polygonal areas. (Excerpt from Santi –Piano concert No. 2 by Jan Beran, col legno CD 20062; courtesy of collegno, Germany.)

Figure 1.15: Iannis Xenakis (1922-1998). (Courtesy of Philippe Gontier,Paris.)

Figure 1.16: Ludwig van Beethoven (1770-1827). (Courtesy of Zentralbib-liothek Zurich.)

Figure 2.1: Robert Schumann (1810-1856) – Traumerei op. 15, No. 7.

Figure 2.2: Tempo curves of Schumann’s Traumerei performed by VladimirHorowitz.

Figure 2.3: Twenty-eight tempo curves of Schumann’s Traumerei per-formed by 24 pianists. (For Cortot and Horowitz, three tempo curveswere available.)

Figure 2.4: Boxplots of descriptive statistics for the 28 tempo curves inFigure 2.3.

Figure 2.5: q-q-plots of several tempo curves (from Figure 2.3).

Figure 2.6: Frequencies of notes 0,1,...,11 for moving windows of onset-length 16.

Figure 2.7: Frequencies of notes 0,1,...,11 for moving windows of onset-length 16.

Figure 2.8: Johannes ChrysostomusWolfgangus Theophilus Mozart (1756-1791) in the house of Salomon Gessner in Zurich. (Courtesy of Zentral-bibliothek Zurich.)

Figure 2.9: R. Schumann (1810-1856) – lithography by H. Bodmer. (Cour-tesy of Zentralbibliothek Zurich.)

Figure 2.10: Acceleration of tempo curves for Cortot and Horowitz.

Figure 2.11: Tempo acceleration – correlation with other performances.

Figure 2.12: Martha Argerich – interpolation of tempo curve by cubicsplines.

Figure 2.13: Smoothed tempo curves g1(t) = (nb1)−1∑K( t−tib1

)yi (b1 =8).

Figure 2.14: Smoothed tempo curves g2(t) = (nb2)−1∑K( t−tib2

)[yi −g1(t)] (b2 = 1).

Figure 2.15: Smoothed tempo curves g3(t) = (nb3)−1∑K( t−tib3

)[yi −g1(t)− g2(t)] (b3 = 1/8).

Figure 2.16: Smoothed tempo curves – residuals e(t) = yi− g1(t)− g2(t)−g3(t).

©2004 CRC Press LLC

Page 261: Statistics in Musicology

Figure 2.17: Melodic indicator – local polynomial fits together with firstand second derivatives.

Figure 2.18: Tempo curves (Figure 2.3) – first derivatives obtained fromlocal polynomial fits (span 24/32).

Figure 2.19: Tempo curves (Figure 2.3) – second derivatives obtained fromlocal polynomial fits (span 8/32).

Figure 2.20: Kinderszene No. 4 – sound wave of performance by Horowitzat the Royal Festival Hall in London on May 22, 1982.

Figure 2.21: log(Amplitude) and tempo for Kinderszene No. 4 – auto- andcross correlations (Figure 2.24a), scatter plot with fitted least squaresand robust lines (Figure 2.24b), time series plots (Figure 2.24c), andsharpened scatter plot (Figure 2.24d).

Figure 2.22: Horowitz’ performance of Kinderszene No. 4 – log(tempo)versus log(Amplitude) and boxplots of log(tempo) for three ranges ofamplitude.

Figure 2.23: Horowitz’ performance of Kinderszene No. 4 – two-dimensionalhistogram of (x, y) = (log(tempo), log(Amplitude)) displayed in a per-spective and image plot respectively.

Figure 2.24: Horowitz’ performance of Kinderszene No. 4 – kernel estimateof two-dimensional distribution of (x, y) = (log(tempo), log(Amplitude))displayed in a perspective and image plot respectively.

Figure 2.25: R. Schumann, Traumerei op. 15, No. 7 – density of melodicindicator with sharpening region (a) and melodic curve plotted againstonset time, with sharpening points highlighted (b).

Figure 2.26: R. Schumann, Traumerei op. 15, No. 7 – tempo by Cortotand Horowitz at sharpening onset times.

Figure 2.27: R. Schumann, Traumerei op. 15, No. 7 – tempo “derivatives”for Cortot and Horowitz at sharpening onset times.

Figure 2.28: Arnold Schonberg (1874-1951), self-portrait. (Courtesy ofVerwertungsgesellschaft Bild-Kunst, Bonn.)

Figure 2.29: a) Chernoff faces for 1. Saltarello (Anonymus, 13th century);2. Prelude and Fugue No. 1 from “Das Wohltemperierte Klavier” (J.S. Bach, 1685-1750); 3. Kinderszene op. 15, No. 1 (R. Schumann, 1810-1856); 4. Piano piece op. 19, No. 2 (A. Schonberg, 1874-1951); 5. RainTree Sketch 1 (T. Takemitsu, 1930-1996); b) Chernoff faces for the samecompositions as in figure 2.29a, after permuting coordinates.

Figure 2.30: The minnesinger Burchard von Wengen (1229-1280), contem-porary of Adam de la Halle (1235?-1288). (From Codex Manesse, cour-tesy of the University Library Heidelberg.) (Color figures follow page168.)

©2004 CRC Press LLC

Page 262: Statistics in Musicology

Figure 2.31: Star plots of p∗j = (p6, p11, p4, p9, p2, p7, p12, p5, p10, p3, p8)t

for compositions from the 13th to the 20th century.Figure 2.32: Symbol plot of the distribution of successive interval pairs(∆y(ti),∆y(ti+1)) (2.36a, c) and their absolute values (b, d) respectively,for the upper envelopes of Bach’s Praludium No. 1 (Das WohltemperierteKlavier I) and Mozart ’s Sonata KV 545 (beginning of 2nd movement).

Figure 2.33: Symbol plot of the distribution of successive interval pairs(∆y(ti),∆y(ti+1)) (a, c) and their absolute values (b, d) respectively, forthe upper envelopes of Scriabin’s Prelude op. 51, No. 4 and F. Martin’sPrelude No. 6.

Figure 2.34: Symbol plot with x = pj5, y = pj7 and radius of circlesproportional to pj1.

Figure 2.35: Symbol plot with x = pj5, y = pj7 and radius of circlesproportional to pj6. (Color figures follow page 168.)

Figure 2.36: Symbol plot with x = pj5, y = pj7. The rectangles havewidth pj1 (diminished second) and height pj6 (augmented fourth). (Colorfigures follow page 168.)

Figure 2.37: Symbol plot with x = pj5, y = pj7, and triangles defined bypj1 (diminished second), pj6 (augmented fourth) and pj10 (diminishedseventh). (Color figures follow page 168.)

Figure 2.38: Names plotted at locations (x, y) = (pj5, pj7). (Color figuresfollow page 168.)

Figure 2.39: Profile plots of p∗j = (p5, p10, p3, p8, p1, p6, p11, p4, p9, p2, p7)t.

Figure 3.1: Ludwig Boltzmann (1844-1906). (Courtesy of OsterreichischePost AG.)

Figure 3.2: Fractal pictures (by Celine Beran, computer generated.) (Colorfigures follow page 168.)

Figure 3.3: Gyorgy Ligeti (*1923). (Courtesy of Philippe Gontier, Paris.)Figure 3.4: Comparison of entropies 1, 2, 3, and 4 for J.S. Bach’s CelloSuite No. I and R. Schumann’s op. 15, No. 2, 3, 4, and 7, and op. 68,No. 2 and 16.

Figure 3.5: Alexander Scriabin (1871-1915) (at the piano) and the con-ductor Serge Koussevitzky. (Painting by Robert Sterl, 1910; courtesy ofGemaldegalerie Neuer Meister, Dresden, and Robert-Sterl-House.)

Figure 3.6: Comparison of entropies 9 and 10 for Bach, Schumann, andScriabin/Martin.

Figure 3.7: Metric, melodic, and harmonic global indicators for Bach’sCanon cancricans.

Figure 3.8: Robert Schumann (1810-1856). (Courtesy of ZentralbibliothekZurich.)

©2004 CRC Press LLC

Page 263: Statistics in Musicology

Figure 3.9: Metric, melodic, and harmonic global indicators for Schu-mann’s op. 15, No. 2 (upper figure), together with smoothed versions(lower figure).

Figure 3.10: Metric, melodic, and harmonic global indicators for Schu-mann’s op. 15, No. 7 upper figure), together with smoothed versions(lower figure).

Figure 3.11: Metric, melodic, and harmonic global indicators for Webern’sVariations op. 27, No. 2 (upper figure), together with smoothed versions(lower figure).

Figure 3.12: R. Schumann – Traumerei: motifs used for specific melodicindicators.

Figure 3.13: R. Schumann – Traumerei: indicators of individual motifs.Figure 3.14: R. Schumann – Traumerei: contributions of individual motifsto overall melodic indicator.

Figure 3.15: R. Schumann – Traumerei: overall melodic indicator.Figure 4.1: Sound wave of c′ and f ′ played on a piano.Figure 4.2: Zoomed piano sound wave – shaded area in Figure 4.1.Figure 4.3: Periodogram of piano sound wave in Figure 4.2.Figure 4.4: Sound wave of e′′� played on a harpsichord.Figure 4.5: Periodogram of harpsichord sound wave in Figure 4.4.Figure 4.6: Harpsichord sound – periodogram plots for different timeframes (moving windows of time points).

Figure 4.7: A harpsichord sound and its spectrogram. Intense pink corre-sponds to high values of I(t, λ). (Color figures follow page 168.)

Figure 4.8: A harpsichord sound wave (a), logarithm of squared amplitudes(b), histogram of the series (c) and its periodogram on log-scale (d)together with fitted SEMIFAR-spectrum.

Figure 4.9: Log-frequencies with fitted SEMIFAR-trend and log-log-periodogramtogether with SEMIFAR-fit for Bach’s first Cello Suite (1st movement;a,b) and Paganini’s Capriccio No. 24 (c,d) respectively.

Figure 4.10: Local variability with fitted SEMIFAR-trend and log-log-periodogram together with SEMIFAR-fit for Bach’s first Cello Suite (1stmovement; a,b) and Paganini’s Capriccio No. 24 (c,d) respectively.

Figure 4.11: Niccolo Paganini (1782-1840). (Courtesy of ZentralbibliothekZurich.)

Figure 5.1: Simulated signal (a) and wavelet coefficients (b); (c) and (d):wavelet components of simulated signal in a; (e) and (f): wavelet com-ponents of simulated signal in a and frequency plot of coefficients.

Figure 5.2: Decomposition of x−series in simulated HIWAVE model.

©2004 CRC Press LLC

Page 264: Statistics in Musicology

Figure 5.3: Simulated HIWAVE model - explanatory series g1 (a), y−series(b), y versus x (c), y versus g1 (d), y versus g2 = x − g1 (e) and timefrequency plot of y (f).

Figure 5.4: HIWAVE time series and fitted function g1.Figure 5.5: Hierarchical decomposition of metric, melodic, and harmonicindicators for Bach’s “Canon cancricans” (Das Musikalische Opfer BWV1079) and Webern’s Variation op. 27, No. 2.

Figure 5.6: Quantitative analysis of performance data is an attempt tounderstand “objectively” how musicians interpret a score without at-taching any subjective judgement. (Left: “Freddy” by J.B.; right: J.S.Bach, woodcutting by Ernst Wurtemberger, Zurich. Courtesy of Zen-tralbibliothek Zurich).

Figure 5.7: Most important melodic curves obtained from HIREG fit totempo curves for Schumann’s Traumerei.

Figure 5.8: Successive aggregation of HIREG-components for tempo curvesby Ashkenazy and Horowitz (third performance).

Figure 5.9 a and b: HISMOOTH-fits to tempo curves (performances 1-14);Figure 5.9 c and d: HISMOOTH-fits to tempo curves (performances 15-28).

Figure 5.10: Time frequency plots for Cortot’s and Horowitz’s three per-formances.

Figure 5.11: Wavelet coefficients for Cortot’s and Horowitz’s three perfor-mances.

Figure 5.12: Tempo curves – approximation by most important 2 bestbasis functions.

Figure 5.13: Tempo curves – approximation by most important 5 bestbasis functions.

Figure 5.14: Tempo curves – approximation by most important 10 bestbasis functions.

Figure 5.15: Tempo curves (a) by Cortot (three curves on top) and Horowitz,R2 obtained in HIWAVE-fit plotted against trial cut-off parameter η (b)and fitted HIWAVE-curves (c).

Figure 5.16: First derivative of tempo curves (a) by Cortot (three curveson top) and Horowitz, R2 obtained in HIWAVE-fit plotted against trialcut-off parameter η (b) and fitted HIWAVE-curves (c).

Figure 5.17: Second derivative of tempo curves (a) by Cortot (three curveson top) and Horowitz, R2 obtained in HIWAVE-fit plotted against trialcut-off parameter η (b) and fitted HIWAVE-curves (c).

Figure 6.1: Jean-Philippe Rameau (1683-1764). (Engraving by A. St. Aubinafter J. J. Cafferi, Paris after 1764; courtesy of Zentralbibliothek Zurich.)

©2004 CRC Press LLC

Page 265: Statistics in Musicology

Figure 6.2: Frederic Chopin (1810-1849). (Courtesy of ZentralbibliothekZurich.)

Figure 6.3: Stationary distributions πj (j = 1, ..., 11) of Markov chainswith state space Z12\{0}, estimated for the transition between successiveintervals.

Figure 6.4: Cluster analysis based on stationary Markov chain distribu-tions for compositions by Bach, Mozart, Haydn, Chopin, Schumann,Brahms, and Rachmaninoff.

Figure 6.5: Cluster analysis based on stationary Markov chain distri-butions of torus distances for compositions by Bach, Mozart, Haydn,Chopin, Schumann, Brahms, and Rachmaninoff.

Figure 6.6: Comparison of log odds ratios log(π1/π2) of stationary Markovchain distributions of torus distances.

Figure 6.7: Comparison of log odds ratios log(π1/π3) of stationary Markovchain distributions of torus distances.

Figure 6.8: Comparison of log odds ratios log(π2/π3) of stationary Markovchain distributions of torus distances.

Figure 6.9: Comparison of log odds ratios log(π1/π3) and log(π2/π3) ofstationary Markov chain distributions of torus distances.

Figure 6.10: Comparison of stationary Markov chain distributions of torusdistances.

Figure 6.11: Log odds ratios log(π1/π3) and log(π2/π3) plotted againstdate of birth of composer.

Figure 6.12: Johannes Brahms (1833-1897). (Courtesy of ZentralbibliothekZurich.)

Figure 7.1: Bela Bartok – statue by Varga Imre in front of the Bela BartokMemorial House in Budapest. (Courtesy of the Bela Bartok MemorialHouse.)

Figure 7.2: Sergei Prokoffieff as a child. (Courtesy of Karadar BertoldiEnsemble; www.karadar.net/Ensemble/.)

Figure 7.3: Circular representation of compositions by J. S. Bach (Praludiumund Fuge No. 5 from “Das Wohltemperierte Klavier”), D. Scarlatti(Sonata Kirkpatrick No. 125), B. Bartok (Bagatelles No. 3), and S.Prokoffieff (Visions fugitives No. 8).

Figure 7.4: Boxplots of λ1, R, d and logm for notes modulo 12, comparingBach, Scarlatti, Bartok, and Prokoffief.

Figure 7.5: Circular representation of intervals of successive notes in thefollowing compositions: J. S. Bach (Praludium und Fuge No. 5 from“Das Wohltemperierte Klavier”), D. Scarlatti (Sonata Kirkpatrick No.125), B. Bartok (Bagatelles No. 3), and S. Prokoffieff (Visions fugitivesNo. 8).

©2004 CRC Press LLC

Page 266: Statistics in Musicology

Figure 7.6: Boxplots of λ1, R, d and logm for note intervals modulo 12,comparing Bach, Scarlatti, Bartok, and Prokoffief.

Figure 7.7: Circular representation of notes ordered according to circleof fourhts in the following compositions: J. S. Bach (Praludium undFuge No. 5 from ”Das Wohltemperierte Klavier”), D. Scarlatti (SonataKirkpatrick No. 125), B. Bartok (Bagatelles No. 3), and S. Prokoffieff(Visions fugitives No. 8).

Figure 7.8: Boxplots of λ1, R, d and logm for notes 12 ordered accordingto circle of fourhts, comparing Bach, Scarlatti, Bartok and Prokoffief.

Figure 7.9: Circular representation of intervals of successive notes orderedaccording to circle of fourhts in the following compositions: J. S. Bach(Praludium und Fuge No. 5 from ”Das Wohltemperierte Klavier”), D.Scarlatti (Sonata Kirkpatrick No. 125), B. Bartok (Bagatelles No. 3),and S. Prokoffieff (Visions fugitives No. 8).

Figure 7.10: Boxplots of λ1, R, d and logm for note intervals modulo 12ordered according to circle of fourhts, comparing Bach, Scarlatti, Bartok,and Prokoffief.

Figure 8.1: Tempo curves for Schumann’s Traumerei: skewness for theeight parts A1, A2, A

′1, A

′2, B1, B2, A

′′1 , A

′′2 for 28 performances, plotted

against the number of the part.Figure 8.2: Schumann’s Traumerei: screeplot for skewness.Figure 8.3: Schumann’s Traumerei: loadings for PCA of skewness.Figure 8.4: Schumann’s Traumerei: symbol plot of principal componentsz2, ..., z5 for PCA of tempo skewness.

Figure 8.5: Schumann’s Traumerei: tempo curves by Cortot, Horowitz,Brendel, and Gianoli.

Figure 8.6: Air by Henry Purcell (1659-1695).Figure 8.7: Screeplot for PCA of entropies.Figure 8.8: Loadings for PCA of entropies.Figure 8.9: Entropies – symbol plot of the first four principal components.Figure 8.10: Entropies – symbol plot of principal components no. 2-5.Figure 8.11: F. Martin (1890-1971). (Courtesy of the Societe Frank Martinand Mrs. Maria Martin.)

Figure 8.12: F. Martin (1890-1971) - manuscript from 8 Preludes. (Cour-tesy of the Societe Frank Martin and Mrs. Maria Martin.)

Figure 9.1: Discriminant analysis combined with time series analysis canbe used to judge purity of intonation (“Elvira” by J.B.).

Figure 9.2: Linear discriminant analysis of compositions before and after1800, with the training sample. The data used for the discriminant ruleconsists of x = (p5, E).

©2004 CRC Press LLC

Page 267: Statistics in Musicology

Figure 9.3: Linear discriminant analysis of compositions before and after1800, with the validation sample. The data used for the discriminantrule consists of x = (p5, E).

Figure 9.4: Linear discriminant analysis of “Early Music to Baroque” and“Romantic to 20th Century”. The points (”o” and ”×”) belong to thetraining sample. The data used for the discriminant rule consists of x =(p5, E).

Figure 9.5: Linear discriminant analysis of “Early Music to Baroque” and“Romantic to 20th century”. The points (”o” and ”×”) belong to thevalidation sample. The data used for the discriminant rule consists ofx = (p5, E).

Figure 9.6: Graduale written for an Augustinian monastery of the dioceseKonstanz, 13th century. (Courtesy of Zentralbibliothek Zurich.) (Colorfigures follow page 168.)

Figure 9.7: Johannes Brahms (1833-1897). (Photograph by Maria Fellinger,courtesy of Zentralbibliothek Zurich.)

Figure 9.8: Richard Wagner (1813-1883). (Engraving by J. Bankel after apainting by C. Jager, courtesy of Zentralbibliothek Zurich.)

Figure 10.1: Complete linkage clustering of log-odds-ratios of note-frequencies.Figure 10.2: Single linkage clustering of log-odds-ratios of note-frequencies.Figure 10.3: Joseph Haydn (1732-1809). (Title page of a biography pub-lished by the Allgemeine Musik-Gesellschaft Zurich, 1830; courtesy ofZentralbibliothek Zurich.)

Figure 10.4: Klavierstuck op. 19, No. 2 by Arnold Schonberg. (Facsimile;used by permission of Belmont Music Publishers.)

Figure 10.5: Complete linkage clustering of entropies.Figure 10.6: Complete linkage clustering of tempo.Figure 10.7: Complete linkage clustering of HISMOOTH-fits to tempocurves.

Figure 10.8: Symbol plot of HISMOOTH bandwidths for tempo curves.The radius of each circle is proportional to a constant plus log b3 thehorizontal and vertical axes are equal to b1 and b2 respectively. The let-ters A–F indicate where at least one observation from the correspondingcluster occurs.

Figure 10.9: Maurizio Pollini (*1942). (Courtesy of Philippe Gontier, Paris.)Figure 11.1: Two-dimensional multidimensional scaling of compositionsranging from the 13th to the 20th century, based on frequencies of in-tervals and interval sequences.

Figure 11.2: Boxplots of second MDS-component where compositions areclassified according to three time periods.

©2004 CRC Press LLC

Page 268: Statistics in Musicology

Figure 11.3: Fragment of a graduale from the 14th century. (Courtesy ofZentralbibliothek Zurich.)

Figure 11.4: Muzio Clementi (1752-1832). (Lithography by H. Bodmer,courtesy of Zentralbibliothek Zurich.)

Figure 11.5: Freddy (by J.B.) and Johannes Brahms (1833-1897) goingfor a drink. (Caricature from a contemporary newspaper; courtesy ofZentralbibliothek Zurich.)

©2004 CRC Press LLC

Page 269: Statistics in Musicology

References

Akaike, H. (1973a). Information theory and an extension of the maximum like-lihood principle. In: Second International Symposium on Information Theory,B.N. Petrow and F. Csaki (eds.), Akademiai Kiado, Budapest, 267-281.

Akaike, H. (1973b). Maximum likelihood identification of Gaussian autoregressivemoving average models. Biometrika, Vol. 60, 255-265.

Akaike, H. (1979). A Bayesian extension of the minimum AIC procedure of au-toregressive model fitting. Biometrika, Vol. 26, 237-242.

Albert. A.A. (1956). Fundamental Concepts of Higher Algebra. University ofChicago Press, Chicago.

Anderberg, M.R. (1973). Cluster Analysis for Applications. Academic Press, NewYork and London.

Anderson, T.W. (1984). An Introduction to Multivariate Statistical Analysis (2nded.). Wiley, New York.

Andreatta, M. (1997) Group-theoretical methods applied to music. PhD thesis,University of Sussex.

M. Andreatta, M., Noll, T., Agon, C. and Assayag, G. (2001). The geometricalgroove: rhythmic canons between theory, implementation and musical exper-iment. In: Les Actes des 8emes Journes dInformatique Musicale, Bourges 7-9juin 2001, p. 93-97.

Antoniadis, A. and Oppenheim, G. (1995). Wavelets and Statistics. Lecture Notesin Statistics, No. 103, Springer, New York.

Arabie, P., Hubert, L.J. and De Soete, G. (1996). Clustering and Classification.World Scientific Pub., London.

Archibald, B. (1972). Some thoughts on symmetry in early Webern. Persp. NewMusic, 10, 159-163.

Ash, R.B. (1965). Information Theory. Wiley, New York.

Ashby, W.R. (1956). An Introduction to Cybernetics. Wiley, New York.

Babbitt, M. (1960) Twelve-tone invariants as compositional determinant. MusicalQuarterly, 46, 245-259.

Babbitt, M. (1961) Set structure as a compositional determinant. JMT, 5, No. 2,72-94.

Babbitt, M. (1987) Words about Music. Dembski A. and Straus J.N. (eds.), Uni-versity of Wisconsin Press, Madison.

Backus, J. (1969). The acoustical Foundations of Music, W.W. Norton & Co.,New York (reprinted 1977).

Bailhache, P. (2001). Une Histoire de l’Acoustique Musicale, CNRS Editions.

Balzano, G.J. (1980). The group-theoretic description of 12-fold and microtonalpitch systems. Computer Music Journal, Vol. 4, No. 4, 66-84.

©2004 CRC Press LLC

Page 270: Statistics in Musicology

Barnard, G.A. (1951). The theory of information. J. Royal Statist. Soc., SeriesB, Vol. 13, 46-69.

Bartlett, M.S. (1955). An Introduction to Stochastic Processes. Cambridge Uni-versity Press, Cambridge.

Batschelet, E. (1981). Circular Statistics. Academic Press, London.

Beament, J. (1997). The Violin Explained: Components, Mechanism, and Sound.Oxford University Press, Oxford.

Benade, A.H. (1976). Fundamentals of Musical Acoustics. Oxford UniversityPress, Oxford. (Reprinted by Dover in 1990).

Benson, D. (1995-2002). Mathematics and Music. Internet Lecture Notes,Department of Mathematics, University of Georgia, USA (available athttp://www.math.uga.edu/~djb/html/math-music.html).

Beran, J. (1987). Aniseikonia. H.O.E. (Bison Records).

Beran, J. (1991). Cirri. Centaur Records, CRC 2100.

Beran, J. (1994). Statistics for Long-Memory Processes. Chapman & Hall, NewYork.

Beran, J. (1995). Maximum likelihood estimation of the differencing parameterfor invertible short- and long-memory ARIMA models. J. R. Statist. Soc.,Series B, Vol. 57, No.4, 659-672.

Beran, J. (1998) Modeling and objective distinction of trends, stationarity andlong-range dependence. Proceedings of the VIIth International Congress ofEcology - INTECOL 98, Farina, A., Kennedy, J. and Bossu, V. (Eds.), p.41.

Beran, J. (2000). Santi. col legno, WWE 1CD 20062 (http://www.col-legno.de).

Beran, J. and Feng. Y. (2002a). SEMIFAR models – a semiparametric frameworkfor modelling trends, long-range dependence and nonstationarity. Computa-tional Statistics & Data Analysis, Vol. 40, No. 2, 393-419.

Beran, J. and Feng, Y. (2002b). Iterative plug-in algorithms for SEMIFAR models– definition, convergence, and asymptotic properties. J. Computational Graph-ical Statist., Vol. 11, No. 3, 690-713.

Beran, J. and Ghosh, S. (2000). Estimation of the dominating frequency for sta-tionary and nonstationary fractional autoregressive processes. J. Time SeriesAnalysis, Vol. 21, No. 5, 513-533.

Beran, J. and Mazzola, G. (1992). Immaculate Concept. SToA music, 1 CD1002.92, Zurich.

Beran, J. and Mazzola, G. (1999). Analyzing musical structure and performance- a statistical approach. Statistical Science, Vol. 14, No. 1, pp.47-79.

Beran, J. and Mazzola, G. (1999). Visualizing the relationship between two timeseries by hierarchical smoothing. J. Computational Graphical Statist., Vol. 8,No. 2, pp.213-238.

Beran, J. and Mazzola, G. (2000). Timing Microstructure in Schumann’s“Traumerei” as an Expression of Harmony, Rhythm, and Motivic Structure inMusic Performance’. Computers Mathematics Appl., Vol. 39, No. 5-6, pp.99-130.

Beran, J. and Mazzola, G. (2001). Musical composition and performance – sta-tistical decomposition and interpretation. Student, Vol. 4, No.1, 13-42.

Beran, J. and Ocker, D. (1999). SEMIFAR forecasts, with applications to foreign

©2004 CRC Press LLC

Page 271: Statistics in Musicology

exchange rates. J. Statistical Planning Inference, 80, 137-153.

Beran, J. and Ocker, D. (2001). Volatility of stock market indices - an analysisbased on SEMIFAR models. J. Bus. Economic Statist., Vol. 19, No. 1, 103-116.

Berg, R.E. and Stork, D.G. (1995). The Physics of Sound (2nd ed.). PrenticeHall, New Jersey.

Berry, W. (1987). Structural Function in Music. Dover, Mineola.

Besag, J. (1989). Towards Bayesian image analysis. J. Appl. Statistics, Vol. 16,395-407.

Besicovitch, A.S. (1935). On the sum of digits of real numbers represented in thedyadic system (On sets of fractional dimensions II). Mathematische Annalen,Vol. 110, 321-330.

Besicovitch, A.S. and Ursell, H.D. (1937). Sets of fractional dimensions (V): Ondimensional numbers of some continuous curves. J. London Mathematical So-ciety, Vol. 29, 449-459.

Bhattacharyya, A. (1946a). On some analogues of the amount of information andtheir use in statistical estimation. Sankhya, Vol. 8, 1-14.

Bhattacharyya, A. (1946b). On a measure of divergence between two multinomialpopulations. Sankhya, 7, 401-406.

Billingsley, P. (1986). Probability and Measure (2nd ed.). Wiley, New York.

Blashfield, R.K. and Aldenderfer, M.S. (1985). Cluster Analysis. Sage, London.

Boltzmann, L. (1896). Vorlesungen uber Gastheorie. Johann Ambrosius Barth,Leipzig.

Borg, I. and Groenen, P. (1997). Modern Multidimensional Scaling: Theory andApplications. Springer, New York.

Bowman, A.W. and Azzalini, A. (1997). Applied Smoothing Techniques for DataAnalysis: The Kernel Approach with S-Plus Illustrations. Oxford UniversityPress, Oxford.

Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis: Forecasting andControl. Holden-Day, San Francisco.

Breiman, L. (1984). Classification and Regression Trees. CRC Press, Boca Raton.

Bremaud, P. (1999). Markov Chains. Springer, New York.

Brillouin, L. (1956). Science and Information Theory. Academic Press, New York.

Brillinger, D. (1981). Time Series Data Analysis and Theory (expanded ed.).Holden Day, San Francisco.

Brillinger, D. and Irizarry, R.A. (1998). An investigation of the second- andhigher-order spectra of music. Signal Processing, Vol. 65, 161-179.

Bringham, E.O. (1988). The Fast Fourier Transform and Applications. PrenticeHall, New Jersey.

Brockwell, P.J. and Davis, R.A. (1991). Time series: Theory and methods (2nded.). Springer, New York.

Brown, E.N. (1990). A note on the asymptotic distribution of the parameterestimates for the harmonic regression model. Biometrika, Vol. 77, No. 3, 653-656.

Chai, W. and Vercoe, B. (2001). Folk Music Classification Using Hidden MarkovModels. Proceedings of International Conference on Artificial Intelligence, June2001 (//web.media.mit.edu/∼ chaiwei/papers/chai ICAI183.pdf).

Chambers, J., Cleveland, W., Kleiner, B., and Tukey, P. (1983). Graphical Meth-

©2004 CRC Press LLC

Page 272: Statistics in Musicology

ods for Data Analysis. Wadsworth Publishing Company: Belmont, California.

Chernick, M.R. (1999). Bootstrap Methods: A Practitioner’s Guide. Jpssey-Bass,New York.

Chung, K.L. (1967). Markov Chains with Stationary Transition Probabilities.Springer, Berlin.

Cleveland, W. (1985). Elements of Graphing Data. Wadsworth Publishing Com-pany: Belmont, California.

Coifman, R., Meyer, Y., and Wickerhauser, V. (1992). Wavelet analysis andsinal processing. In: Wavelets and Their Applications, pp. 153-178. Jones andBartlett Publishers, Boston.

Coifman, R. and Wickerhauser, V. (1992). Entropy-based algorithms for bestbasis selection. IEEE Transactions on Information Theory, Vol. 38, No. 2,713-718.

Conway, J.H. and Sloane, N.J.A. (1988). Sphere packings, lattices and groups.Grundlehren der mathematischen Wissenschaften 290, Springe, Berlin.

Cooley, J.W. and Tukey, J.W. (1965). An algorithm for the machine calculationof complex Fourier series. Math. Comput., Vol. 19, 297-301.

Cox, T.F. and Cox, M.A.A. (1994). Multidimensional Scaling. Chapman & Hall,London.

Cremer, L. (1984). The Physics of The Violin, MIT Press, 1984.

Crocker, M.J. (ed.) (1998). Handbook of Acoustics, Wiley Interscience: New York.

Dahlhaus, R. (1987). Efficient parameter estimation for self-similar processes.Ann. Statist., Vol. 17, 1749-1766.

Dahlhaus, R. (1996a). Maximum likelihood estimation and model selection forlocally stationary processes. J. Nonpar. Statist., Vol. 6, 171-191.

Dahlhaus, R. (1996b) Asymptotic statistical inference for nonstationary processeswith evolutionary spectra. In: Athens Conference on Applied Probability andTime Series, Vol. II, P.M. Robinson and M. Rosenblatt (Eds.), 145-159, Lec-ture Notes in Statistics, 115, Springer, New York.

Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. Ann.Statistics, Vol. 25, 1-37.

Daubechies, I. (1992). Ten Lectures on Wavelets. SIAM, Philadelphia, PA.

Davison, A.C. and Hinkley, D.V. (1997). Bootstrap Methods and Their Applica-tion. Cambridge University Press, Cambridge.

de la Motte-Haber, H. (1996). Handbuch der Musikpsychologie (2nd ed.). LaaberVerlag, Laaber.

Devaney, R.L. (1990). Chaos, Fractals and Dynamics. Addison-Wesley, Califor-nia.

Diaconis, P., Graham, R.L., and Kantor, W.M. (1983). The mathematics of per-fect shuffles. Adv. Appl. Math., Vol. 4, 175-196.

Diggle, P. (1990) Time Series – A Biostatistical Introduction. Oxford UniversityPress, Ocford.

Dillon, W. R. and Goldstein, M. (1984). Multivariate Analysis, Methods andApplications. Wiley, New York.

Donoho, D.L. and Johnstone, I.M. (1995). Adapting to unknown smoothness viawavelet shrinkage. JASA, 90, 1200-1224.

Donoho, D.L. and Johnstone, I.M. (1998). Minimax estimation via wavelet shrink-

©2004 CRC Press LLC

Page 273: Statistics in Musicology

age. Ann. Statistics 26, 879-921.

Donoho, D.L., Johnstone, I.M., Kerkyacharian, G., and Picard, D. (1995).Wavelet shrinkage: Asymptopia? J. R. Statist. Soc., Series B, 57, 301-337.

Donoho, D.L., Johnstone, I.M., Kerkyacharian, G., and Picard, D. (1996). Densityestimation by wavelet thresholding. Ann. Statistics, 24, 508-539.

Draper, N.R. and Smith, H. (1998). Applied Regression Analysis (3rd ed.). Wiley,New York.

Duda, R.O., Hart, P.E. and Stork, D.G. (2000). Pattern classification (2nd ed.).Wiley, New York.

Edgar, G.A. (1990). Measure, Topology and Fractal Geometry. Springer, NewYork.

Effelsberg, W. and Steinmetz, R. (1998). Video Compression Techniques. DpunktVerlag, Heidelberg.

Efron, B. (1979). Bootstrap methods: another look at the jackknife. Ann. Statis-tics, Vol. 7, 1- 26.

Eimert, H. (1964). Grundlagen der musikalischen Reihentechnik. Universal Edi-tion, Vienna.

Elliott, R.J., Agoun, L., and Moore, J.B. (1995). Hidden Markov Models: Esti-mation and Control. Springer, New York.

Erdos, P. (1946). On the distribution function of additive functions. Ann. Math-ematics, Vol. 43, 1-20.

Eubank, R.L. (1999). Nonparametric Regression and Spline Smoothing (2nd ed.).Marcel Dekker: New York.

Everitt, B.S., Landau, S. and Leese, M. (2001). Cluster Analysis (4th ed.). OxfordUniversity Press, Oxford.

Everitt, B.S. and Rabe-Hesketh, S. (1997). The Analysis of Proximity Data.Arnold, London.

Falconer, K.J. (1985). The Geometry of Fractal Sets. Cambridge University Press,Cambridge.

Falconer, K.J. (1986). Random Fractals. Math. Proc. Cambridge Philos. Soc.,Vol. 100, 559-582.

Falconer, K.J. (1990). Fractal Geometry. Wiley, New York.

Fan, J. and Gijbels, I. (1995). Data-driven bandwidth selection in local polyno-mial fitting: Variable bandwidth and spatial adaptation. J. R. Statist. Soc.,Ser. B, 57, 371–394.

Fan, J. and Gijbels, I. (1996). Local Polynomial Modeling and Its Applications.Chapman & Hall, London.

Feng, Y. (1999). Kernel- and Locally Weighted Regression – with Applications toTime Series Decomposition. Verlag fur Wissenschaft und Forschung, Berlin.

Fisher, N.I. (1993). Statistical Analysis of Circular Data. Cambridge UniversityPress, Cambridge.

Fisher, R.A. (1925). Theory of Statistical Information. Proc. Camb. Phil. Soc.,Vol. 22, pp. 700-725.

Fisher, R.A. (1956). Statistical Methods and Scientific Inference. Oliver & Boyd,London.

Fleischer, A. (2003). Die analytische Interpretation. Schritte zur Erschlieungeines Forschungsfeldes am Beispiel der Metrik. PhD dissertation, Humboldt-

©2004 CRC Press LLC

Page 274: Statistics in Musicology

University Berlin. dissertation.de, Verlag im Internet GmbH, Berlin.

Fleischer, A., Mazzola, G., Noll, Th. Zur Konzeption der Software RUBATO furmusikalische Analyse und Performance. Musiktheorie, Heft 4, pp.314-325, 2000.

Fletcher, T.J. (1956). Campanological groups. American Math. Monthly, 63/9,619-626.

Fletcher, N.H. and Rossing, T.D. (1991). The Physics of Musical Instruments.Springer, Berlin/New York.

Flury, B. and Riedwyl, H. (1988). Multivariate Statistics: A Practical Approach.Cambridge University Press, Cambridge, UK.

Forte, A. (1964). A theory of set-complexes for music. JMT, 8, No. 2, 136-183.

Forte, A. (1973). Structure of atonal music. Yale University Press, New Haven.

Forte, A. (1989). La set-complex theory: elevons les enjeux! Analyse musicale,4eme trimestre, 80-86.

Fox, R. and Taqqu, M.S. (1986). Large sample properties of parameter estimatesfor strongly dependent stationary Gaussian time series. Ann. Statisics., Vol.14, 517-532.

Friedman, J.H. (1977). A recursive partitioning decision rule for nonparametricclassification. IEEE Transactions on Computers, Vol. 26, No. 4, 404-408.

Fripertinger, H. (1991). Enumeration in music theory. Seminaire Lotharingien deCombinatoire, 26, 29-42.

Fripertinger, H. (1999). Enumeration and construction in music theory. DiderotForum on Mathematics and Music Computational and Mathematical Methodsin Music, Vienna, Austria. December 2–4, 1999. H. G. Feichtinger and M.Drfler, editors. sterreichische Computergesellschaft, 179-204.

Fripertinger, H. (2001). Enumeration of non-isomorphic canons. Tatra MountainsMath. Publ., 23.

Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition (2nd ed.).Academic Press, New York.

Gasser, T. and Muller, H.G. (1979). Kernel estimation of regression functions.In: Smoothing Techniques for Curve Estimation. Gasser, T., Rosenblatt, M.(Eds.), Springer, New York, pp. 23-68.

Gasser, T. and Muller, H.G. (1984). Estimating regression functions and theirderivatives by the kernel method. Scand. J. Statist., Vol. 11, 171-185.

Gasser, T., Muller, H.G., and Mammitzsch, V. (1985). Kernels for nonparametriccurve estimation. J. R. Statist. Soc., Ser. B, Vol. 47, 238-252.

Genevois, H. and Orlarey, Y. (1997). Musique et Mathematiques. Aleas-Grame,Lyon.

Gervini, D. and Yohai, V.J. (2002). A class of robust and fully efficient regressionestimators. Ann. Statistics, Vol. 30, 583-616.

Ghosh, S. (1996). A new graphical tool to detect non-normality. J. R. Statist.Society, Series B, Vol. 58, 691-702.

Ghosh, S. (1999). T3-plot. In: Encyclopedia for Statistical Sciences, Update vol-ume 3, (S. Kotz ed.), pp. 739-744, Wiley, New York.

Ghosh, S. and Beran, J. (2000). Comparing two distributions: The two sampleT3 plot. J. Computational Graphical Statist., Vol. 9, No. 1, 167-179.

W.J. Gilbert (2002) Modern Algebra with Applications. Wiley, New York.

Ghosh, S. and Draghicescu, D.(2002a). Predicting the distribution function for

©2004 CRC Press LLC

Page 275: Statistics in Musicology

long-memory processes. Int. J. Forecasting, 18, 283-290.

Ghosh, S., Draghicescu, D. (2002b). An algorithm for optimal bandwidth selec-tion for smooth nonparametric quantiles and distribution functions. In: Statis-tics in Industry and Technology: Statistical Data Analysis Based on the L1-Norm and Related Methods. Dodge Y. (Ed.), Birkhauser Verlag, Basel, Switzer-land, pp. 161-168.

Ghosh, S., Beran, J. and Innes, J. (1997). Nonparametric conditional quantileestimation in the presence of long memory. Student - Special issue on theconference on L1-Norm and related methods, Vol. 2, 109-117.

Gilks, W.R., Richardson, S., and Spiegelhalter, D.J. (Eds.) (1996). Markov ChainMonte Carlo in Practice. Chapman & Hall, London.

Goldman, S. (1953). Information Theory. Prentice Hall, New Jersey.

Good, P.I. (2001). Resampling Methods. Birkhauser, Basel.

Gordon, A.D. (1999). Classification (2nd ed.). Chapman and Hall, London.

Gotze, H. and Wille, R. (Eds.) (1985). Musik und Mathematik (SalzburgerMusikgesprch 1984 unter Vorsitz von Herbert von Karajan). Springer, Berlin.

Graeser, W. (1924). Bachs “Kunst der Fuge”. In: Bach-Jahrbuch, 1924.

Graff, K.F. (1975). Wave Motion in Elastic Solids. Oxford University Press.(reprinted by Dover, 1991).

Granger, C.W.J. and Joyeux, R. (1980). An introduction to long-range time seriesmodels and fractional differencing. J. Time Series Anal., Vol. 1, 15-30.

Grenander, U. and Szego, G. (1958). Toeplitz Forms and Their Application. Univ.California Press, Berkeley.

Grey, J. (1977). Multidimensional perceptual scaling of musical timbre. J. Acous-tical Soc. America, Vol. 62, 1270-1277.

Grey, J. and Gordon, J. (1978). Perceptual Effects of spectralmodifications onmusical timbres. J. Acoust. Soc. America, 63, 1493-1500.

Gromko, J.E. (1993). Perceptual Differences between expert and novice musiclisteners at multidimensional scaling analysis. Psychology of Music, 21, 34-47.

Guttman, L. (1954). A new approach to factor analysis: the radex. In: Mathemat-ical thinking in the behavioral sciences, P. Lazarsfeld (Ed.). Free Press, NewYork, pp. 258-348.

Guttman, L. (1968). A general non-metric technique for finding the smallest co-ordinate space for a configuration of points. Psychometrika, 33, 469-506.

Hall, D.E. (1980). Musical Acoustics. Wadsworth Publishing Company: Belmont,California.

Halsey, D. and Hewitt, E. (1978). Eine gruppentheoretische Methode in derMusiktheorie. Jaresbericht der Duetschen Math. Vereinigung, Vol. 80.

Hampel, F.R., Ronchetti, E., Rousseeuw, P., and Stahel, W.A. (1986). RobustStatistics: The Approach based on Influence Functions. Wiley, New York.

Hand, D.J. (1986). Discrimination and Classification. Wiley, New York.

Hand, D.J., Mannila, H., and Smyth, P. (2001). Principles of Data Mining. MITPress, Cambdridge (USA).

Hannan, E.J. (1973). The estimation of frequency. J. Appl. Probab., Vol. 10,510-519.

Hannan, E.J. and Quinn, B.G. (1979). The determination of the order of anautoregression. J. R. Statist. Soc., Series B, Vol. 41, 190-195.

©2004 CRC Press LLC

Page 276: Statistics in Musicology

Hardle, W. (1991) Smoothing Techniques. Springer. New York.

Hardle, W., Kerkyacharian, G., Picard, D., and Tsybokov, A. (1998). Wavelets,Approximation, and Statistical Applications. Lecture Notes in Statistics, No.129. Springer, New York.

Hartigan, J.A. (1975). Clustering Algorithms. Wiley, New York.

Hartley, R.V. (1928). Transmission of information. Bell Syst. Techn. J., 535-563.

Hassan, T. (1982). Nonlinear time series regression for a class of amplitude mod-ulated cosinusoids. J. Time Series Analysis, Vol. 3, 109-122.

Hastie, T., Tibshirani, R., and Buja, A. (1994). Flexible discriminant analysis byoptimal scoring. JASA, Vol. 89, 1255-1270.

Hastie, T., Tibshirani, R., and Friedman, J.H. (2001). The Elements of StatisticalLearning: Data Mining, Inference, and Prediction. Springer, New York.

Hausdorff, F. (1919). Dimension und ausseres mass. Mathematische Annalen, Vol.79, 157-179.

von Helmholtz, H. (1863). Die Lehre von den Tonempfindungen als physiologischeGrundlage der Musik, Reprinted in Darmstadt, 1968.

Herstein, I.N. (1975). Topics in Algebra. Wiley, New York.

Hirst, D. (1996). Error-rate estimation in multiple-group linear discriminant anal-ysis. Technometrics, Vol. 38, 389-399.

Hjort, N.L. and Glad, I.K. (2002). Nonparametric density estimation with a para-metric start. Ann. Statistics, Vol. 23, No. 3, 882-904.

Hofstadter, D.R. (1999). Gdel, Escher, Bach, Basic Books, New York.

Hoppner, F. Klawonn, F., Kruse, R. and Runkler, T. (1999). Fuzzy Cluster Anal-ysis. Wiley, New York.

Hosking, J.R.M. (1981). Fractional differencing. Biometrika, Vol. 68, 165-176.

Howard, D.M. and Angus, J. (1996). Acoustics and Psychoacoustics, Focal Press.

Huber, P. (1981). Robust Statistics. Wiley, New York.

Huberty, C.J. (1994). Applied Discriminant Analysis. Wiley, New York.

Hurvich, C.M. and Ray, B.K. (1995). Estimation of the memory parameter fornonstationary or noninvertible fractionally integrated processes. J. Time SeriesAnal., Vol. 16 17-41.

Irizarry, R.A. (1998). Statistics and music: fitting a local harmonic model tomusical sound signals. PhD thesis, University of California, Berkeley.

Irizarry, R.A. (2000). Asymptotic distribution of estimates for a time-varyingparameter in a harmonic model with multiple fundamentals. Statistica Sinica,Vol. 10, 1041-1067.

Irizarry, R.A. (2001). Local harmonic estimation in musical sound signals. JASA,Vol. 96, No. 454, 357-367.

Irizarry, R.A. (2002). Weighted estimation of harmonic components in a musicalsound signal. J. Time Series Anal., Vol. 23, 29-48.

Isaacson, D.L. and Madsen, R.W. (1976). Markov Chains Theory and Applica-tions. Wiley, New York.

Jaffard, S., Meyer, Y., and Ryan, R. (2001). Wavelets: Tools for Science andTechnology. SIAM, Philadelphia.

Jajuga, K., Sokoowski, A. and Bock, H.H. (Eds.) (2002). Statistical PatternRecognition. Springer, New York.

Jammalamadaka, S.R. and SenGupta, A. (2001). Topics in circular statistics.

©2004 CRC Press LLC

Page 277: Statistics in Musicology

Series on Multivariate Analysis, Vol. 5. World Scientific, River Edge, NJ.

Jansen, M. (2001). Noise Reduction by Wavelet Thresholding. Lecture Notes inStatistics, No. 161. Springer, New York.

Jardine, N. and Sibson, R. (1971). Mathematical Taxonomy. Wiley, New York.

Johnson, J. (1997). Graph Theoretical Methods of Abstract Musical Transforma-tion. Greenwood Publishing Group, London.

Johnson, R.A. and Wichern, D.W. (2002). Applied Multivariate Statistical Anal-ysis. Pretice Hall, New Jersey.

Johnston, I. (1989). Measured Tones: The Interplay of Physics and Music. Insti-tute of Physics Publishing, Bristol and Philadelphia.

Joshi, D.D. (1957). L’information en statistique mathematique et dans la theoriedes communications. PhD thesis, Faculte des Sciences de l’Universite de Paris.

Juang, B.H. and Rabiner, L.R. (1991). Hidden Markov models for speech recog-nition. Technometrics, Vol. 33, 251-272.

Kaiser, G. (1994). A Friendly Guide to Wavelets. Birkhauser, Boston.

Keil, W. (1991). Gibt es den Goldenen Schnitt in der Musik des 16. bis 19.Jahrhunderts? Eine kritische Untersuchung rezenter Forschungen. AugsburgerJahrbuch fur Musikwissenschaft, Vol. 8 1991. p. 7-70. Schneider, Tutzing, Ger-many.

Kelly, J.P. (1991). Hearing. In: Principles of Neural Science, E.R. Kandel, J.H.Schwarz, T.M. Jessel (Eds.), Elsevier, New York, pp. 481-499.

Kemey, J.G., Snell, J.L., and Knapp, A.W. (1976). Denumerable Markov Chains.Springer, New York.

Khinchin, A.I. (1953). The entropy concept in probability theory. Uspekhi Matem-aticheskikh Nauk, Vol. 8, No. 3 (55), 3-20 (Russian).

Khinchin, A.I. (1956). On the fundamental theorems of information theory. Us-pekhi Matematicheskikh Nauk, Vol. 11, No. 1 (67), 17-75 (Russian).

Kinsler, L.E., Frey, A.R., Coppens, A.B., and Sanders, J.V. (2000) Fundamentalsof Acoustics, (4th ed.). Wiley, New York.

Klecka, W.R. (1980). Discriminant Analysis. Sage, London.

Kolmogorov, A.N. (1956). On the Shannon theory of information transmissionin the case of continuous signals. IRE Trans. on Inform. Theory, Vol. IT-2,102-108.

Kono, N. (1986). Hausdorff dimension of sample paths for self-similar processes.In: Dependence in Probability and Statistics, E. Eberlein and M.S. Taqqu (eds.),Birkhauser, Boston.

Kruskal, J.B. (1964a). Multidimensional scaling by optimizing goodness of fit toa nonmetric hypothesis. Psychometrika, 29, 1-27.

Kruskal, J.B. (1964b). Nonmetric multidimensional scaling: a numerical method.Psychometrika, 29, 115-129.

Kruskal, J.B. and Wish, M. (1978). Multidimensional Scaling. Sage, London.

Krzanowski, W.J. (1988). Principles of Multivariate Analysis. Oxford UniversityPress, Oxford.

Kullback, S. (1959). Information Theory and Statistics. Wiley, Newy York.

Lanciani, A. (2001). Mathematiques et musique: les labyrinthes de laphenomenologie. Editions Jerome Millon, Grenoble.

Lauter, H. (1985). An efficient estimator for the error rate in discriminant anal-

©2004 CRC Press LLC

Page 278: Statistics in Musicology

ysis. Statistics, Vol. 16, 107-119.

Lamperti, J.W. (1962). Semi-stable stochastic processes. Trans. American Math.Soc., Vol. 104, 62-78.

Lamperti, J.W. (1972). Semi-stable Markov processes. Z. Wahrsch. verw. Geb.,Vol. 22, 205-225.

LeBlanc, M. and Tibshirani, R. (1996). Combining estimates in regression andclassification. JASA, Vol. 91, 1641-1650.

Lendvai, E. (1993). Symmetries of Music. Kodaly Institute, Kecskemet.

Levinson, S.E., Rabiner, L.R., and Sondhi, M.M. (1983). An introduction to theapplication of the theory of probabilistic functions of a Markov process toautomatic speech reconition. Bell Systems Tech. J., Vol. 62, 1035-1074.

Lewin, D. (1987). Generalized Musical Intervals and Transformations. Yale Uni-versity Press, New Haven/London.

Leyton, M. (2001). A Generative Theory of Shape. Springer, New York.

Licklider, J.R.C. (1951). A duplex theory of pitch reception. Experientia, Vol. 7,128-134.

Ligges, U., Weihs, C., Hasse-Becker, P. (2002). Detection of locally stationarysegments in time series. In: Proceedings in Computational Statistics, W. Hrdle,B. Rnz (Eds.), pp. 285-290.

Lindley, M. and Turner-Smith, R. (1993). Mathematical Models of Musical Scales.Verlag fur systematische Musikwissenschaft GmbH, Bonn.

Lingoes, J.C. and Roskam, E.E. (1973). A mathematical and empirical analysisof two multidimensional scaling algorithms. Psychometrika, 38, MonographSuppl. No. 19.

MacDonald, I.L. and Zucchini, W. (1997). Hidden Markov and Other Models forDiscrete-valued Time Series. Chapman & Hall, London.

Mallat, S. (1998). A Wavelet Tour of Signal Processing. Academic Press, London.

Mandelbrot, B.B. (1953). Contribution a la theorie mathematique des jeux decommunication. Publs. Inst. Statist. Univ. Paris, Vol. 2, Fasc. 1 et 2, 3-124.

Mandelbrot, B.B. (1956). An outline of a purely phenomenological theory ofstatistical thermodynamics: I. canonical ensembles. IRE Trans. on Inform.Theory, Vol. IT-2, 190-203.

Mandelbrot, B.B. (1977). Fractals: Form, Chance and Dimension. Freeman &Co., San Francisco.

Mandelbrot, B.B. (1983). The Fractal Geometry of Nature. Freeman & Co., SanFrancisco.

Mandelbrot, B.B. and van Ness, J.W. (1968). Fractional Brownian motions, frac-tional noises and applications. SIAM Review, Vol. 10, No.4, 422-437.

Mandelbrot, B.B. and Wallis, J.R. (1969). Computer experiments with fractionalGaussian noises. Water Resour. Res., Vol. 5, No.1, 228-267.

Mardia, K.V. (1972). Statistics of Directional Data. Academic Press, London.

Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979). Multivariate Analysis. Aca-demic Press, London.

Markuse, B. and Schneider, A. (1995). Ahnlichkeit, Nahe, Distanz: ZurAnwendung multidimensionaler Skalierung in musikwissenschaftlichen Un-tersuchungen. In: Festschrift fur Jobst Peter Fricke zum 65. Geburt-stag, W. Auhagen, B. Gatjen and K. Niemoller (Eds.), Musikwis-

©2004 CRC Press LLC

Page 279: Statistics in Musicology

senschaftliches Institut der Universitt zu Koln (http://www.uni-koeln.de/phil-fak/muwi/publ/fs fricke/festschrift.html).

Matheron, G. (1973). The intrinsic random functions and their applications. Adv.Appl. Prob., Vol. 5, 439-468.

Mathieu, E. (1861). Memoire sur l’etude des fonctions de plusieurs quantitees. J.Math. Pures Appl., Vol. 6, 241-243.

Mathieu, E. (1873). Sur la fonction cinq fois transitive de 24 quantitees. J. Math.Pures Appl., Vol. 18, 25-46.

Mazzola, G. (1985) Gruppen und Kategorien in der Musik, Heldermann-Verlag,Berlin.

Mazzola, G. (1990a). Geometrie der Tone. Birkhauser, Basel.

Mazzola, G. (1990b). Synthesis. SToA music 1001.90, Zurich.

Mazzola, G. (1989/1994). Presto. SToA music, Zurich.

Mazzola, G. (2002). The Topos of Music. Birkhauser, Basel.

Mazzola, G. and Beran, J. (1998). Rational composition of performance. In: con-trolling creative processes in music, W. Auhagen, R. Kopiez (Eds.), StaatlichesInstitut fur Musikforschung (Berlin), Lang Verlag, Frankfurt/New York.

Mazzola, G., Zahorka, O. and Stange-Elbe, J. (1995). Analysis and Performanceof a Dream. In: Proceedings of the 1995 Symposium on Musical Performance,J. Sundberg (ed.), KTH, Stockholm.

McLachlan, G.J. (1992). Discriminant Analysis and Statistical Pattern Recogni-tion. Wiley, New York.

McMillan, B. (1953). The basic theorems of information theory. Ann. Math.Statistics, 24, 196-219.

Meyer, Y. (1992). Wavelets and Operators. Cambridge UniversityPress,Cambridge.

Meyer, Y. (1993). Wavelets: Algorithms and Applications. SIAM, Philadelphia,PA.

Morris, R.D. (1987). Composition with Pitch-Classes. Yale University Press, NewHaven.

Morris, R.D. (1995). Compositional spaces and other territories. PNM 33, 328-358.

Morse, P.M. and Ingard, K.U. (1968). Theoretical Acoustics. McGraw Hill.(Reprinted by Princeton University Press 1986.)

Mosteller, F. and Tukey, J.W. (1977). Data Analysis and Regression. Addison-Wesley, Reading, MA.

Nadaraya, E.A. (1964). On estimating regression. Theory of Probability and itsApplications, Vol. 9, 141-142.

Nederveen, C.J. (1998). Acoustical Aspects of Woodwind Instruments. NorthernIllinois University Press, de Kalb.

Nettheim, N. (1997). A Bibliography of Statistical Applications in Musicology.Musicology Australia, Vol. 20, 94-106.

Newton, H.J. and Pagano, M. (1983). A method for determining periods in timeseries. JASA, Vol. 78, 152-157.

Noll, T. (1997). Harmonische Morpheme. Musikometrika, Vol. 8, 7-32.

Norden, H. (1964). Proportions in Music. Fibonacci Quarterly, Vol. 2, 219.

Norris, J.R. (1998). Markov Chains. Cambridge University Press, Cambridge.

©2004 CRC Press LLC

Page 280: Statistics in Musicology

Ogden, R.T. (1996). Essential Wavelets for Statistical Applications and DataAnalysis. Birkhauser, Boston.

Orbach, J. (1999). Sound and Music. University Press of America, Lanham, MD.

Parzen, E. (1962). On estimation of a probability density function and mode.Ann. Math. Statistics, Vol. 33, 1065-1076.

Peitgen, H.-O. and Saupe, D. (1988). The Science of Fractal Images. Springer,New York.

Percival, D.B. and Walden, A.T. (2000). Wavelet Methods for Time Series Anal-ysis. Cambridge University Press, Cambridge, UK.

Perle, G. (1955). Symmetric formations in the string quartets of Bela Bartok.Music Review 16, 300-312.

Pierce, J.R. (1983). The Science of Musical Sound. Scientific American Books,New York (2nd ed. printed by W.H. Freeman & Co, 1992).

Plackett, R.L. (1960). Principles of Regression Analysis. Clarendon Press, Ox-ford.

Polzehl, J. (1995). Projection pursuit discriminant analysis. ComputationalStatist. Data Anal., Vol. 20, 141-157.

Price, B.D. (1969). Mathematical groups in campanology. Math. Gaz., 53, 129-133.

Priestley, M.B. (1965). Evolutionary spectra and non-stationary processes. J. R.Statist. Soc., Series B, Vol. 27, 204-237.

Priestley, M.B. (1981b). Spectral Analysis and Time Series, (Vol. 1): UnivariateTime Series. Academic Press, New York.

Priestley, M.B. (1981b). Spectral Analysis and Time Series, (Vol. 2): MultivariateSeries, Prediction and Control. Academic Press, New York.

Quinn, B.G. and Thomson, P.J. (1991) Estimating the frequency of a periodicfunction. Biometrika, Vol. 78, No. 1, 65-74.

Rahn, J. (1980). Basic Atonal Theory. Longman, New York.

Raichel, D.R. (2000). The Science and Applications of Acoustics. American Inst.of Physics, College Park, PA.

Ramsay, J.O. (1977). Maximum likelihood estimation in multidimensional scal-ing. Psychometrika, 42, 241-266.

Raphael, C.S. (1999). Automatic segmentation of acoustic music signals usinghidden Markov models. IEEE Transactions on Pattern Analysis and MachineIntelligence, Vol. 21, No. 4, 360-370.

Raphael, C.S. (2001a). A probabilistic expert system for automatic musical ac-companiment. J. Computational Graphical Statist., Vol. 10, No. 3, 487-512.

Raphael, C.S. (2001b). Synthesizing musical accompaniment with Bayesian beliefnetworks. J. New Music Res., Vol. 30, No. 1, 59-67.

Rao, C.R. (1973). Linear Statistical Inference and its Applications (2nd ed.).Wiley & Sons, New York.

Rayleigh, J.W.S. (1896). The Theory of Sound (2 vols), 2nd ed., Macmillan,London (Reprinted by Dover, 1945).

Read, R.C. (1997). Combinatorial problems in the theory of music. Discrete Math-ematics, 167/168, 543-551.

Reiner, D. (1985). Enumeration in music theory, American Math. Monthly, 92/1,51-54.

©2004 CRC Press LLC

Page 281: Statistics in Musicology

Renyi, A. (1959a). On the dimension and entropy of probability distributions.Acta Mathe. Acad. Sci. Hung., Vol. 10, 193-215.

Renyi, A. (1959b). On a theorem of P. Erdos and its applications in informationtheory. Mathematica Cluj, Vol. 1, No. 24, 341-344.

Renyi, A. (1961). On measures of entropy and information. Proc. Fourth BerkeleySymposium on Math. Stat. Prob., Vol. I, Univ. California Press, Berkeley, 547-561.

Renyi, A. (1965). On foundations of information theory. Review of the Interna-tional Statistical Institute, Vol. 33, 1-14.

Renyi, A. (1970). Probability Theory. North Holland, Amsterdam.

Repp, B. (1992). Diversity and Communality in Music Performance: An Analysisof Timing Microstructure in Schumann’s “Traumerei”. J. Acoustic Soc. Am.,92, 2546-2568.

Rigden, J.S. (1977). Physics and the Sound of Music. Wiley, New York.

Ripley, B. (1995). Pattern Recognition and Neural Networks. Cambridge Univer-sity Press, Cambridge.

Rodet, X.(1997). Musical sound signals analysis/synthesis: sinusoidal+residualand elementary waveform models. Appl. Signal Processing, 4, 131-141.

Roederer, J.G. (1995). The Physics and Psychophysics of Music. Springer,Berlin/New York.

Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a densityfunction. Ann. Math. Statistics, Vol. 27, 832-837.

Rossing, T.D. (ed.) (1984). Acoustics of Bells. Van Nostrand Reinhold, New York.

Rossing, T.D. (1990). The Science of Sound (2nd ed.). Addison-Wesley, Reading,MA.

Rossing, T.D. (2000). Science of Percussion Instruments. World Scientic, London.

Rossing, T.D. and Fletcher, N.H. (1995). Principles of Vibration and Sound.Springer, Berlin/New York.

Rotman, J.J. (2002). Advanced Modern Algebra. Prentice Hall, New Jersey.

Rousseeuw, P. and Yohai, V.J. (1984). Robust regression by means of S-estimators. In: Robust Nonlinear Time Series Analysis, J. Franke, W. Hardle,and D. Martin (Eds.), Lecture Notes in Statistics, Vol. 26, 256-277, Springer,New York.

Ruppert, D. and Wand, M.P. (1994). Multivariate locally weighted least squaresregression. Ann. Statistics, Vol. 22, 1346-1370.

Ryan, T.P. (1997). Modern Regression Methods. Wiley, New York.

Scheffe, H. (1959). The Analysis of Variance. Wiley, New York.

Schnitzler, G. (1976). Musik und Zahl. Verlag fr systematische Musikwissenschaft,Bonn.

Schonberg, A. (1950). Die Komposition in 12 Tonen. In: Style and Idea, NewYork.

Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist., Vol. 6,461-464.

Seber, G.A.F. (1984). Multivariate Observations. Wiley, New York.

Serra, X. and Smith, J.O. (1991). Spectral modeling synthesis: A sound anal-ysis/synthesis system based on deterministic plus stochastic decomposition.Computer Music J., Vol. 14, No. 4, 12-24.

©2004 CRC Press LLC

Page 282: Statistics in Musicology

Shannon, C.E. (1948). A mathematical theory of communication. Bell Syst.Techn. J., Vol. 27, 379-423.

Shannon, C.E. and Weaver, W. (1949). The Mathematical Theory of Communi-cation. Univ. Illinois Press, Urbana.

Shepard, R.N. (1962a). The analysis of proximities: multidimensional scaling withunknown distance function Part I. Psychometrika, 27, 125-140.

Shepard, R.N. (1962b). The analysis of proximities: multidimensional scaling withunknown distance function Part II. Psychometrika, 27, 219-246.

Schiffman, S. (1997). Introduction to Multidimensional Scaling: Theory, Methods,and Applications by Susan. Academic Press, New York.

Shumway, R. and Stoffer, D.S. (2000). Time Series Analysis and Its Applications.Springer, New York.

Silverman, B. (1986). Density estimation for statistics and data analysis. Chap-man & Hall, London.

Simonoff, J.S. (1996). Smoothing methods in statistics. Springer, New York.

Sinai, Y.G. (1976). Self-similar probability distributions. Theory Probab. Appl.,Vol. 21, 64-80.

Slaney, M. and Lyon, R.F. (1991). Apple hearing demo real. Apple TechnicalReport No. 25, Apple Computer Inc., Cupertino, CA.

Solo, V. (1992). Intrinsic random fluctuations. SIAM Appl. Math., Vol. 52, 270-291.

Solomon, L.J. (1973). Symmetry as a determinant of musical composition. PhDthesis, University of West Virginia.

Srivastava, M. and Sen, A.K. (1997). Regression Analysis: Theory, Methods andApplications. Springer, New York.

Stamatatos, E. and Widmer, G. (2002). Music perfomer recognition using anensemble of simple classifiers. Austrian Research Institute for Artificial Intel-ligence, Vienna, TR-2002-02.

Stange-Elbe, J. (2000). Analyse- und Interpretationsperspektiven zu J.S. Bachs“Kunst der Fuge” mit Werkzeugen der objektorientierten Informationstech-nologie. Habilitation thesis, University of Osnabruck.

Steinberg, R. (ed.) (1995). Music and the Mind Machine. Springer, Heidelberg.

Stewart, I. (1992). Another fine math you’ve got me into. . . , W. H. Freeman.

Stoyan, D. and Stoyan, H. (1994). Fractals, Random Shapes and Point Fields:Methods of Geometrical Statistics. Wiley, New York.

Straub, H. (1989). Beitrage zur modultheoretischen Klassifikation musikalischerMotive. Diploma thesis, ETH Zurich.

Taylor, R. (1999a). Fractal analysis of Pollocks drip paintings. Nature, Vol. 399,p. 422.

Taylor, R. (1999b). Fractal Expressionism. Physics World, Vol. 12, No. 10, p. 25.

Taylor, R. (1999c). Fractal expressionism where art meets science. In: Art andComplexity, J. Casti (ed.), Perseus Press, Vol.

Taylor, R. (2000). The use of science to investigate Jackson Pollocks drip paint-ings. Art and the Brain, Journal of Consciousness Studies, Vol. 7, No. 8-9,p137.

Telcs, A. (1990). Spectra of graphs and fractal dimensions. Probab. Th. Rel.Fields, Vol. 82, 435-449.

©2004 CRC Press LLC

Page 283: Statistics in Musicology

Thumfart, A. (1995). Discrete Evolutionary Spectra and their Application to aTheory of Pitch Perception. StatLab Heidelberg, Beitrage zur Statistik, No.30.

Tricot, C. (1995). Curves and Fractal Dimension. Springer, New York.

Tufte, E. (1983). The visual display of quantitative information. Addison-Wesley,Reading, MA.

Tukey, J.W. (1977). Exploratory data analysis. Addison-Wesley, Reading, MA.

Tukey, P.A. and Tukey, J.W. (1981). Graphical display of data sets in 3 or moredimensions. In: Interpreting Multivariate Data, V. Barnett (ed.), Wiley, Chich-ester, UK.

Ueda, K. and Ohgushi, K. (1987). Perceptual components of pitch: spatial rep-resentation using a multidimensional scaling technique. J. Acoust. Soc. Am.,82, 1193-1200.

Velleman, P. and Hoaglin, D. (1981). The ABC’s of EDA: Applications, Basics,and Computing of Exploratory Data Analysis. Duxbury, Belmont, CA.

Vidakovic, B. (1999). Statistical Modeling by Wavelets. John Wiley, New York.

Voss, R.F. and Clarke, J. (1975). 1/f noise in music and speech. Nature, Vol. 258,317-318.

Voss, R.F. and Clarke, J. (1978). 1/f noise in music: music from 1/f noise. J.Acoust. Soc. America, Vol. 63, 258-263.

Voss, R.F. (1988). Fractals in nature: From characterization to simulation. In:Science of fractal images, H.-O. Peitgen and D. Saupe (Eds.), Springer, Berlin,pp. 26-69.

Vuza, D.T. (1991). Supplementary sets and regular complementary unendingcanons (part one). Persp. New Music, Vol. 29, No. 2, 22-49.

Vuza, D.T. (1992a). Supplementary sets and regular complementary unendingcanons (part two). Persp. of New Music, Vol. 30, No. 1, 184-207.

Vuza, D.T. (1992b). Supplementary sets and regular complementary unendingcanons (part three). Persp. New Music, Vol. 30, No. 2, 102-125.

Vuza, D.T. (1993). Supplementary sets and regular complementary unendingcanons (part four). Persp. New Music, Vol. 31, No. 1, 270-305.

van der Waerden, B.L. (1979). Die Pythagoreer. Artemis, Zurich.

Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.

Walker, A.M. (1971). On the estimation of a harmonic component in a time serieswith stationary independent residuals. Biometrika, Vol. 58, 21-36.

Walmsley, P.J., Godsill, S.J. and Rayner, P.J.W. (1999). Bayesian graphical mod-els for polyphonic pitch tracking. In: Diderot Forum on Mathematics and MusicComputational and Mathematical Methods in Music, Vienna, Austria, Decem-ber 2-4, 1999, H. G. Feichtinger and M. Drfler (eds.), sterreichische Comput-ergesellschaft.

Wand, M.P. and Jones, M.C. (1995). Kernel Smoothing. Chapman and Hall,London.

Watson, G. (1964). Smooth regression analysis. Sankhya, Series A, Vol. 26, 359-372.

Watson, G. (1983). Statistics on Spheres. Wiley, New York.

Waugh, W.A. (1996). Music, probability, and statistics. In: Encyclopedia of Sta-tistical Sciences, by S. Kotz, C. B. Read, and D.L. Banks (Eds.), 6, 134-137.

©2004 CRC Press LLC

Page 284: Statistics in Musicology

Webb, A.R. (2002). Statistical Pattern Recognition (2nd ed.). Wiley, New York.Wedin, L. (1972). Multidimensional scaling of emotional expression in music.

Svensk Tidskrift for Musikforskning, 54, 115-131.Wedin, L. and Goude, G. (1972). Dimension analysis of the perception of musical

timbre. Scand. J. Psychol., 13, 228-240.Weihs, C., Berghoff, S., Hasse-Becker, P. and Ligges, U. (2001). Assessment of

Purity of Intonation in Singing Presentations by Discriminant Analysis. In:Mathematical Statistics and Biometrical Applications, J. Kunert, and G. Tren-kler. (Eds.), pp. 395-410.

White, A.T. (1983). Ringing the changes. Math. Proc. Camb. Phil. Soc. 94, 203-215.

White, A.T. (1985). Ringing the changes II. Ars Combinatorica, 20-A, 65-75.White, A.T. (1987). Ringing the cosets. American Math. Monthly 94/8, 721-746.Whittle, P. (1953). Estimation and information in stationary time series. Ark.

Mat., Vol. 2, 423-434.Widmer, G. (2001). Discovering Simple Rules in Complex Data: A Meta-learning

Algorithm and Some Surprising Musical Discoveries. Austrian Research Insti-tute for Artifical Intelligence, Vienna, TR-2001-31.

Wiener, N. (1948). Cybernetics or control and communication in the animal andthe machine. Act. Sci. Indust., No. 1053, Hermann et Cie, Paris.

Wilson, W.G. (1965). Change Ringing. October House Inc., New York.Wolfowitz, J. (1957). The coding of messages subject to chance errors. Illinois J.

Math., Vol. 1, 591-606.Wolfowitz, J. (1958). Information theory for mathematicians. Ann. Math. Statis-

tics, Vol. 29, 351-356.Wolfowitz, J. (1961). Coding Theorems of Information Theory. Springer, Berlin.Woodward, P.M. (1953). Probability and Information Theory with Applications

to Radar. Pergamon Press, London.Xenakis, I. (1971). Formalized Music: Thought and Mathematics in Composition.

Indiana University Press, Bloomington/London.Yaglom, A.M. and Yaglom, I.M. (1967). Wahrscheinlichkeit und Information.

Deutscher Verlag der Wissenschaften, Berlin.Yost, W.A. (1977). Fundamentals of Hearing. An Introduction. Academic Press,

San Diego.Yohai, V.J. (1987). High breakdown-point and high efficiency robust estimates

for regression. Ann. Statistics, Vol. 15, 642-656.Yohai, V.J., Stahel, W.A., and Zamar, R. (1991). A procedure for robust estima-

tion and inference in linear regression. In: Directions in robust statistics anddiagnostics, Part II, W.A. Stahel, and S.W. Weisberg (Eds.), Springer, NewYork.

Young, G. and Householder, A. S. (1941). A note on multidimensional psycho-physical analysis. Psychometrika, 6, 331-333.

Zassenhaus, H.J. (1999). The Theory of Groups. Dover, Mineola.Zivot, E. and Wang, J. (2002). Modeling Financial Time Series with S-Plus.

Springer, New York.

©2004 CRC Press LLC