CHAPTER 9: EXPLORING THE II-V-I CHORD · PDF fileCHAPTER 9: EXPLORING THE II-V-I CHORD ... discussing chord progressions in the context of establishing tonality. ... In jazz a chord

© Michael R. W. Dawson 2014

CHAPTER 9:

EXPLORING THE II-V-I CHORD PROGRESSION

The purpose of this chapter is to explore an important source of tonality in Western music, a musically related sequence of chords called a chord progression. The chapter begins by discussing chord progressions in the context of establishing tonality. It then describes an important chord progression in jazz called the ii-V-I progression. The goal of this chapter is to train a number of different networks on this progression; when provided one chord, the network is trained to respond with the next chord in the progression that should be played. While all of the networks in the chapter are trained on the same jazz progression problem, different networks use different codes to represent input and output chords. The question of interest is whether the choice of encoding has any effect on the ease of discovering a solution to the progression problem. We present results that show that this is definitely true: when an abstract encoding is employed, a multilayer perceptron with many hidden units is required to learn the chord progression. In contrast, when different encodings are used the same problem can be learned by a perceptron that has no hidden units at all. We demonstrate a practical implication of this by demonstrating that a simple network can be easily interpreted. We end the chapter by pointing out that other factors might influence encoding selection; depending on the goals of a simulation, one may not always be seeking the encoding that leads to the simplest network solution.

9.1 Tonality and Chord Progressions ............................... 2

9.2 The ii-V-I Progression .................................................. 5

9.3 The Importance of Encodings ..................................... 8

9.4 Four Encodings of the ii-V-I Problem ....................... 10

9.5 Simulations With Pitch-Class Encoding .................. 15

9.6 Simulations Using Pitch Encodings of Root Forms 17

9.7 Simulations Using Pitch Encodings of Inverted Forms ............................................................................... 18

9.8 Simulations Using Lead Sheet Encodings ............... 19

9.9 Interpreting A Lead Sheet Perceptron ...................... 20

9.10 A Progression of Progressions .............................. 26

9.11 Summary and Implications ..................................... 31

9.12 References ............................................................... 34

Chapter 9 Exploring the ii-V-I Progression 2


9.1 Tonality and Chord Progressions Tonality is a central characteristic of

Western music (Piston, 1962; Schoenberg, 1969). Tonality is the sense that a composition belongs to a particular musical key, and has been a topic central to many of the earlier parts of this book, including Chapter 3 on identifying scale tonics, Chapter 4 on identifying scale modes, and Chapter 5 on key finding.

In what is known as the era of ‘common

practice’ in Western classical music, which spans the 18th and 19th centuries, establishing a musical key almost always meant establishing a major or minor key of the type that we have already encountered in previous chapters. Although other modes can and have been used in Western compositions, they are essentially ignored by common practice’s focus on the ‘major-minor system’ of music. “We are so imbued with this tradition that we tend to interpret music based on other modes as being in either major or minor, usually with somewhat unsatisfactory results” (Piston, 1962, p. 30).

In our earlier chapters on the modality

and tonics of musical scales we noted that one could define a musical key by constraining pitch-class choice: specifically, by only using tones that belonged to a particular scale. For instance, to set a composition in the key of C major a composer would only select its tones from the C major scale, and not employ tones that do not belong to this scale.

However, establishing tonality is more

complex than merely restricting the use of particular tones. The tones that define a musical scale have an organized relationship to one another, relationships that even listeners with no musical training are aware of (Krumhansl, 1990). Because of these relationships, different tones in a scale – typically identified by their scale degree (i.e. the Roman numerals used earlier in Figures 4-11 and 7-15), such as I for the tonic, IV for the subdominant, and V for the dominant – have specific tonal functions.

For example, “dominant and

subdominant seem to given an impression

of balanced support of the tonic, like two equidistant weights on either side of a fulcrum” (Piston, 1962, p. 31). Alternatively, some tones are intrinsically unstable in a particular musical key; their presence produces a musical tension that demands resolution in the form of hearing a more stable musical entity.

Establishing tonality, then, also requires

communicating and exploiting the various relationships amongst tones in a musical key. “It is more a process of setting forth the organized relationship of these tones to one among them which is to be the tonal center” (Piston, 1962, p. 31). In common practice the process of establishing tonality and its structure is accomplished by using harmony.

The basic element of harmony is the

musical interval, the simultaneous presence of two tones a specific musical distance apart. Chords involve the presentation of more than two simultaneous tones, and therefore the presence of more than one musical interval. Earlier in Chapter 7 we saw how one could construct particular chords (triads and tetrachords) on each of the degrees of a musical scale (Figure 7-15). Accounts of harmony often begin by considering the use of, and the relationship between, triads (Piston, 1962; Schoenberg, 1969).

Just as the presence of a single tone

cannot by itself establish a musical key, the occurrence of a triad in isolation cannot establish tonality. “A triad standing alone is entirely indefinite in its harmonic meaning; it may be the tonic of one tonality or one degree of several others” (Schoenberg, 1969, p. 1). In order for tonality to be established, a succession of triads must be presented. The succession must be structured so that there is the relationship from one triad to the next takes the listener to an intended goal. Such a structured succession of chords is called a chord progression. In jazz a chord progression is often called ‘the changes’.

There are two strongly related aspects to

creating a chord progression: establishing a sequence of chord roots, and defining the



structure of each chord to establish appropriate voice leading. Let us briefly consider these two aspects in turn.

Defining a particular succession of

chords requires considering the succession of chord roots independently of the form (the inversion) of each chord. “Chord succession can be reduced to root succession (or root progression), which in turn can be translated into Roman numerals representing a succession of scale degrees” (Piston, 1962, p. 18). Piston notes that common practice reveals a set of typical root progressions which are summarized in Table 9-1. Each row of this table provides the root of the current chord in a progression, and notes the root of the next chord that typically follows, that sometimes follows, and that less often follows. For instance its first row can be interpreted as “If the current chord has I as its root, it is typically followed by a IV or by a V chord, it is sometimes followed by a VI chord, and it is less often followed by a II or III chord.”

Root Of Current Chord

Typically Followed

By

Sometimes Followed

By

Less Often

Followed By

I IV or V VI II or III II V VI I, III, or IV III VI IV II or V IV V I or II III or VI V I VI or IV III or II VI II or V III or IV I VII III

Table 9-1. The usual progression of chord roots in common practice. See text for details.

Why do the root progressions

summarized in Table 9-1 emerge from common practice? The reason is that the relations amongst chords with these roots in terms of a particular tonal center are such that they instill a particular musical direction to a listener. For instance, the progression from V to I defines the root progression of what is called the perfect cadence, which is a musical phrase that produces a satisfying dispersal of tension that can be used to signify the end of a phrase or of a composition. A similar, but less satisfying, effect is produced by the plagal cadence that proceeds from IV to I.

Other chord progressions intensify

tension instead of relieving it. For example,

an imperfect cadence or a half cadence ends on a V chord, and can be preceded by any of a number of different chords (e.g. IV or I). The tension produced by ending on the V chord provides a clear signal that further music is coming. It is “like a comma, indicating a partial stop in an unfinished statement” (Piston, 1962, p. 60).

Root progressions can also be used to

perform other functions, such as modulating from one musical key to another (Schoenberg, 1969). We saw earlier in Chapter 4 that different musical scales are similar to one another because they share many tones. As a result the same chord can be found in more than one musical key; these are called common chords.

For instance, A minor is a common chord

found in both the key of C major (where it is the built on the VI scale degree) and in the key of G major (where it is built on the II scale degree). One can therefore use A minor as a pivot chord in a cadence that modulates the key of a composition from C major to G major.

Root successions are only aspect of a

chord progression. A second strongly related aspect is voice leading (see the earlier discussion of this term in Section 4.5.3). In choral music different voices perform the component notes of each chord in a progression. In addition to defining the succession of chord roots in this progression, a composer of choral music must also decide which voice is to move from one tone in the first chord to another tone in the second.

Common practice adopts principles that

lead to efficient voice leading, which attempts to minimize the musical distance travelled by each voice as it moves from tone to tone in successive chords (Tymoczko, 2006, 2008, 2011). In compositional terms, this means choosing the form of each chord – in particular, the inversion of each chord (see the discussion of Figure 6-1) – that leads to the most efficient voice leading.

Although the notion of efficient voice

leading is typically framed in the context of choral music, it plays an important role in



other kinds of composition as well. For instance, when performing chord progressions on a piano efficient voice leading translates into using chord forms that minimize finger movements from one chord to the next (Sudnow, 1978). We will see later in this chapter, and again in Chapter 10, that using efficient voice leading to motivate the encoding of musical problems for a network there can be a profound impact on network simplicity.

The harmonic structures of common

practice are not restricted to classical music. Popular music also uses these principles, although typically in simpler form (e.g. Schoenberg, 1969, p. 2). Chord progressions also define the structure of most jazz pieces (Sudnow, 1978); a recent study examines ‘lead sheets’ – an abbreviated notation of chord progressions -- that define a jazz repertory of 1,186 songs (Broze & Shanahan, 2013).

The harmonic structure of jazz can be as

complex as that found in common practice. Indeed, there exist strong relationships between the harmonic structures of jazz and of classical music.

For instance, radical new harmonies

were introduced to jazz in the 1940s and 1950s by such be-bop pioneers as Thelonious Monk, Charlie Parker, and Dizzie Gillespie (Kelley, 2009). However, analyses of its structure reveals the same tonal hierarchies that are the foundation of common practice harmony (Jarvinen, 1995). “The underlying structures of two different-sounding pieces of music, for example a Schubert lied and an improvisation by Hank Mobley, share a remarkably similar tonal hierarchy” (Jarvinen, 1995, p. 435).

Similarly, a casual listen to the free-form

improvisations of saxophonist John Coltrane does not reveal traditional harmonic structure. His music can easily be described as a radical departure from be-bop (Porter, 1998). However, careful analysis of Coltrane’s music reveals structures that are inspired by classical music (Demsey, 1991). In particular, Demsey discovered that sections of songs on Coltrane’s seminal Giant Steps album were strongly related to exercises in Nicolas Slonimsky’s Thesaurus

of Scales and Melodic Patterns (Slonimsky, 1947). This book was written for an intended audience of classical composers, but was part of Coltrane’s daily practice regimen in the 1950s. “Slonimsky may be the most direct link between John Coltrane and structural principles of the late nineteenth century” (Demsey, 1991, p. 155).

The current chapter presents our first

attempt at examining harmonic structure in the context of successions of chords. It does so by exploring a common and important jazz chord progression called the ii-V-I progression. Artificial neural networks learn this progression in the sense that when presented some chord (in a particular musical key) that belongs to the progression, the network will respond with the next chord (in the same musical key) that belongs to the progression.

The current chapter uses the ii-V-I

progression to introduce another key idea that must be considered when using artificial neural networks to explore music: encoding. In any musical task, a researcher must make design decisions about how to represent musical stimuli for a network, and how to represent the musical responses of a network. A primary goal of the current chapter is to show that the choice of an encoding can substantially impact the nature of the network that learns the chord progression.

Recall from Sections 3.5 and 4.1 that the

complexity of a classification problem is reflected in the complexity of the network that learns to perform the classification. Simple classification problems, such as the identification of scale tonics, can be performed by perceptrons which do not contain hidden units. More complex classification problems, such as the identification of scale mode, require more complicated networks that include hidden units (i.e. multilayer perceptrons).

We will see that when one encoding of

the ii-V-I progression problem is used, a fairly complicated multilayer perceptron is required for its solution. However, if the identical jazz progression problem is encoded in a different fashion then a simpler network can solve the problem.



9.2 The ii-V-I Progression9.2.1 Three Tetrachords Per Key

The topic of the current chapter is a

succession of chords called the ii-V-I chord progression. This progression is extremely important and popular; it is likely the most commonly encountered in jazz (Levine, 1989). In its most basic form this progression involves three different tetrachords, each defined in the same musical key; as a result the ii-V-I progression can be written for each of the twelve different major keys in Western music. The three chords in any of these versions of the progression are constructed using particular notes in a major scale as their root; the scale used defines the key of the progression.

The first tetrachord in the ii-V-I

progression is the minor seventh chord constructed using the second note of the major scale as its root. This is the ii chord; its Roman numeral name is written in lower case because it is minor, and also indicates the position of the chord’s root in the major scale for the chord’s musical key. For instance, the second note in the C major scale is D, so the ii tetrachord for the key of C is Dm7 which includes the notes D, F, A and C.

The second tetrachord in the ii-V-I

progression is the dominant seventh tetrachord constructed using the fifth note of its key’s major scale as its root. In the C major scale this note is G, so in the key of C the V chord in the progression is G7 which uses the notes G, B, D, and F.

The third tetrachord in the ii-V-I

progression is the major seventh tetrachord constructed using the first note of its key’s major scale as its root. In the C major scale this note is C, so in the key of C the I chord in the progression is Cmaj7 which contains the notes C, E, G, and B.

The procedure described above for

constructing the three chords in the key of C is used to construct the ii-V-I progression using any other major scale. Table 9-2 provides the three chords in this progression for each of the possible major keys in

Western music. Figure 9-1 presents a musical score which represents the chords in this progression in every key, with each chord in its root position.

Chord

Key ii V I A Bm7 E7 Amaj7 A# or B♭

Cm7 F7 A#maj7 or B♭maj7

B C#m7 or D♭m7

F#7 or G♭7

Bmaj7

C Dm7 G7 Cmaj7 C# or D♭

D#m7 or D♭m7

G#7 or A♭7

C#maj7 or D♭maj7

D Em7 A7 Dmaj7 D# or E♭

Fm7 A#7 or B♭7

D#maj7 or E♭maj7

E F#m7 or G♭m7

B7 Emaj7

F Gm7 C7 Fmaj7 F# or G♭

G#m7 or A♭m7

C#7 or D♭7

F#maj7 or G♭maj7

G Am7 Dm7 Gmaj7 G# or A♭

A#m7 or B♭m7

D#7 or E♭7

G#maj7 or A♭maj7

Table 9-2. The three tetrachords that define the ii-V-I progression for each major key. Where appropriate two different enharmonic names of the same chord are provided.

The ii-V-I progression is important in jazz

compositions for several reasons. First, it establishes tonality. For any major key, the most stable tones are notes I, IV, and V (Krumhansl, 1990), and the most stable chords are the ones built on those three notes. In other words, the ii-V-I progression involves two of the most stable pitch-classes of a major key, including chords built using the I and V pitch-classes as roots.

Second, in the perception of chord

sequences there are definite preferences for the IV chord to resolve into the V chord, and for the V chord to resolve into the I chord, producing the IV-V-I progression that is common in cadences in classical music (Bharucha, 1984; Jarvinen, 1995; Katz, 1995; Krumhansl, Bharucha, & Kessler, 1982; Rosner & Narmour, 1992). The role of the IV chord in this relationship can also



be served by a ii chord because of its minor nature. (Steedman, 1984). Thus the ii-V-I progression is a powerful tool for establishing the tonality of a musical piece, operating in an analogous fashion to the IV-V-I.

Third, the ii-V-I progression lends itself to

a further progression of chord progressions. That is, the ii-V-I progression in one key leads naturally to the ii-V-I progression in a different key. In particular, it is very easy to move from the last chord of the progression

in one key to the first chord of the progression in a key that is a full tone lower. For instance, in the key of C the progression ends with Cmaj7; a performer can easily move from this chord to a Cmin7 which is the first chord of the ii-V-I progression in the key of B♭. As a result, one can move from one key to another, providing variety but also establishing tonality because the same progression is used in different but related keys.

Figure 9-1. The ii-V-I progression for every key with chords represented in root position. Each pair

of bars presents the three tetrachords that provide the progression in a particular key; the key is defined by the major seventh chord that ends the progression.



9.2.2 The ii-V-I Progression Problem In the current chapter we will be

interested in training networks to generate the ii-V-I progression in any key; we will not be concerned with building networks to generate a progression of these progressions from key to key.

The ii-V-I progression problem is defined

by considering it from the perspective of pattern classification. In pattern classification, a network generates a discrete class name to a presented stimulus; the name that it generates classifies the input pattern.

The ii-V-I progression can be viewed as

involving exactly this sort of pattern classification. Imagine a situation in which a tetrachord is being presented to a network. Our goal is to have the network classify this input chord by generating a class name. However, in the ii-V-I progression problem, the discrete class name that is output is in fact another tetrachord. In particular, when presented one chord, the network’s task is to generate a representation of the next chord in the progression.

For example, consider the ii-V-I

progression in the key of C, which involves the Dmin7, G7, and Cmaj7 chords. We want to train a network so that when Dmin7 is presented to its input units it responds with a representation of G7 in its output units. Similarly, when G7 is presented to its input units, it should generate Cmaj7 in its output units. We want analogous behavior from the network for the other eleven possible musical keys. Each key involves defining two input/output pairs, one involving the minor seventh and the dominant seventh chords, the other involving the dominant seventh and the major seventh chords. A major seventh chord is never used as an input pattern, and when properly trained the network will never generate a minor seventh chord as a response. The entire training set consists of 24 different input/output pattern pairs.

One focus of the current chapter is on

the nature of an artificial neural network that can learn the ii-V-I progression problem. A second focus concerns the effects of

encoding its tetrachords using different representational formats. We can encode the input and output chords for the ii-V-I progression problem in a number of different ways. Of particular interest in the current chapter is whether the choice of encoding impacts the complexity of the network required to learn the progression.

Before the results of training networks on

the ii-V-I progression problem let us first discuss the importance of encoding, and how there are several different approaches to encoding tetrachords that are worthy of exploration, and which may have an effect on the kind of network required to learn the problem.



9.3 The Importance of Encodings 9.3.1 Dasein and Design In Being and Time (Heidegger,

1927/1962), philosopher Martin Heidegger explored a fundamental question: what does it mean for an entity to exist? In attempting to answer this question, Heidegger investigated different modes of being, and introduced the concept Dasein, which literally means ‘there-being’, and which is typically translated as Being-in-the-world.

Being-in-the-world is a notion that human

existence can only be defined by recognizing that this existence is embedded or immersed in the day-to-day world. “Dasein’s understanding of Being pertains with equal primordiality both to an understanding of something like a ‘world’, and to the understanding of the Being of those entities which become accessible in the world” (Heidegger, 1927/1962, p. 33).

What does it mean for entities in the

world to become accessible? Heidegger proposed that part of an agent’s engagement with the world involves using equipment. Equipment consists of entities that are experienced by agents in terms of the potential actions or experiences that they make available. Thus Heidegger’s notion of equipment seems similar to the later notion of affordance central to the ecological theory of perception (Gibson, 1979). According to Gibson (p. 127) "the affordances of the environment are what it offers the animal, what it provides or furnishes, either for good or ill."

Heidegger (1927/1962) also introduced

the notion of readiness-to-hand as a property of equipment. Readiness-to-hand occurs when agents are properly engaged with equipment. With readiness-to-hand an entity’s affordances are properly experienced, but its other properties (like its physical existence) disappear. It is as if we are able to use a tool to interact with the world, and can experience this use, but the tool itself is invisible to us.

Heidegger’s philosophy plays a founding role in an important school of cognitive science called embodied cognitive science

(Winograd & Flores, 1987). Embodied cognitive science attempts to explain cognition by focusing on the intrinsic relationships between agents, their bodies, and the structure of the world (Calvo & Gomila, 2008; Chemero, 2009; Dawson, 2013; Dawson, Dupuis, & Wilson, 2010; Shapiro, 2011, 2014; Varela, Thompson, & Rosch, 1991). Embodied cognitive science can easily be described as a school of thought that opposes many of the key assumptions of connectionist cognitive science (Dawson, 2013).

Heidegger’s concept of readiness-to-

hand has played a key role in the debate between different schools of cognitive science (Vera & Simon, 1993; Winograd & Flores, 1987). Furthermore, this concept has served the purpose of finding links between embodied cognitive science and the science of design. Winograd and Flores took readiness-to-hand as evidence of good design; we only become aware of equipment itself when the structural coupling between world, equipment, and agent breaks down. In other words, if we are aware of the existence of a tool, then the tool is poorly designed.

The invisibility of artifacts – the

readiness-to-hand of equipment – is also frequently characterized as being evidence of good design (Dourish, 2001; Norman, 1998, 2002, 2004). Winograd and Flores took the goal of designing equipment, such as human-computer interfaces, to be creating artifacts that are invisible to us when they are used. “A successful word processing device lets a person operate on the words and paragraphs displayed on the screen, without being aware of formulating and giving commands” (Winograd & Flores, 1987, p. 164).

9.3.2 Solutions by Design Readiness-to-hand is not only relevant to

artifacts and their design, but is also important to problem-solving. In cognitive science, problem-solving is typically described as searching a problem space (Newell & Simon, 1972). A problem space is a representation of the current state of



knowledge about a problem. Applying a rule to change our knowledge of the problem is equivalent to moving to a new location (from one state of knowledge to the next) in the problem space. Solving a problem involves searching the problem-space to find a route that moves one from the problem’s initial state through intermediate states of knowledge, finally ending at a solution to the problem.

The amount of time required to search

through a problem space to find a route to the problem’s solution reflects problem difficulty. The longer the search, the harder is the problem.

Crucially, search complexity depends in

part upon the manner in which states of knowledge about the problem are encoded. If a problem is encoded using one representational scheme, then its solution may require a long and difficult search. However, if the same problem is encoded in a different format, then its difficulty can be drastically reduced.

For example, consider this famous

anecdote that emerged from Thomas Edison’s laboratory in Menlo Park (Josephson, 1961). When initially hired, mathematical physicist Francis Upton’s first task for Edison was to calculate the volume of a pear-shaped glass bulb used for experiments on electric lighting. Upton represented this problem in a format suited for mathematical analysis. “Upton drew the shape of the bulb exactly on paper, and got the equation of its lines, with which he was going to calculate its contents” (Josephson, 1961, p. 193). After an hour, Edison asked Upton for the results, and was told that the mathematician was only halfway done and needed more time. Edison responded that a different representation of the problem would produce faster results: “‘Why’, said

Edison, ‘I would simply take that bulb, fill it with a liquid, and measure its volume directly’.”

Edison’s example shows that one can

represent a problem in such a way that its solution becomes trivial. That is, with the proper encoding, a problem’s solution exhibits readiness-to-hand: the solution is immediately apparent, and the process of searching for the solution is so trivial that it becomes invisible.

That problem representations make

problem solving exhibit readiness-to-hand was central to Herbert Simons’ account of the sciences of the artificial (Simon, 1969). Simon recognized the importance of finding problem representations that worked by revealing solutions effortlessly: "All mathematical derivation can be viewed simply as change in representation, making evident what was previously true but obscure." (Simon, 1969, p. 77).

Simon argued that a great many different

disciplines, including cognitive science, were in reality sciences of design because they studied the interface between inner and outer environments. When this interface is optimal, it exhibits readiness-to-hand, and disappears from experience. As a result, "in large part, the proper study of mankind is the science of design." (Simon, 1969, p. 83).

The theme of the current chapter and the

next is to explore the connectionist cognitive science of music in the context of efficient design. In particular, it is possible to use many different encodings of the same musical problem. Even though the musical problem remains constant, changing the problem’s encoding can make it much more difficult – or much easier – for a network to learn.



9.4 Four Encodings of the ii-V-I Problem In designing a training set to be used to

teach a network the ii-V-I progression, one must decide how to represent tetrachords both as stimuli and as responses. Ideally, the choice of representation would be ‘theory neutral’ (Pylyshyn, 1984): regardless of our choice of representation, the results of training a network on the task would be the same. Not surprisingly, though, this ideal situation does not arise: different choices of how to represent tetrachords for the network lead to very different simulation results.

Let us first describe four plausible

methods for representing tetrachords to networks that must learn the ii-V-I progression. Later in the chapter we will present results that clearly show that these choices are not theory neutral!

9.4.1 Pitch-Class Encoding Most of the networks that have been

described earlier in this book have employed a pitch-class representation, which is the first kind of encoding to consider for the ii-V-I progression. In this representation, only twelve units are required. Each unit represents the presence or absence of one of the possible pitch-classes in Western music.

One major advantage of pitch-class

representation is its simplicity: a very small number of input and output units are required to represent any of the different tetrachords that can occur in the progression. A pitch-class representation of the ii-V-I problem requires only 12 input units to represent an input tetrachord, and the same number of output units to represent the tetrachord response generated by the network.

In pitch-class encoding, as we have seen

in earlier chapters, a tetrachord stimulus is represented by turning on the four input units that represent the chord’s component pitch-classes, and by turning all of the other eight input units off. For the ii-V-I problem the network uses the same encoding to represent its tetrachord responses in the output units.

9.4.2 Pitch Encoding All of the chords in the ii-V-I progressions

of Figure 9-1 are presented in root position. That is, the lowest note of each tetrachord in either score is the chord’s root: the lowest note of Dm7 is D, the lowest note of G7 is G, and so on.

One consequence of having each

tetrachord in root position is that there is a marked similarity in chord ‘shape’, which is the spacing between adjacent notes in the chord. Tetrachords of the same type (minor seventh, dominant seventh or major seventh) have very similar shape: four notes that are evenly spaced as they are stacked upon each other on the staff.

One can imagine that the input units

used for pitch-class encoding are the keys of a small piano. The mapping between the input units and the piano keyboard is illustrated in Figure 9-2. However, this mapping reveals a possible disadvantage of pitch-class representation: when this encoding is adopted the similarity of shape between different chords of the same type is necessarily lost. That is, different spacing between notes -- different chord inversions -- is required to fit any of the tetrachords from Figure 9-1 on this keyboard because of its small size.

Figure 9-2. The mapping of the input units

used for pitch-class encoding onto a 12-key piano keyboard.

This is demonstrated in Figure 9-3. This

figure illustrates uses 12-key keyboards to represent four different tetrachords that are of the same type (minor seventh chords) but



belong to different keys. Each belongs to the ii-V-I progression in a particular key. However, to fit each of these chords onto the small keyboard, different chord shapes are required.

For example, Figure 9-3 shows that the

Amin7 can be fit on this keyboard in root position (the A is the lowest note, which is the leftmost note colored grey in the illustration). In contrast, Cmin7 must be fit using its first inversion (C is the second lowest note), Dmin7 must be fit using its second inversion (D is the second highest note), and Gmin7 must be fit using its third inversion (G is the highest note).

Figure 9-3. Four different minor seventh

tetrachords; notes that belong to the chord are shaded gre. To fit these chords on the 12-key keyboard, four different chord shapes or inversions are required. See text for details.

Using a representation that preserves a

tetrachord’s shape could be critical, particularly if a chord’s shape provides information about its identity. Figure 9-3

shows that a pitch-class encoding is not capable of preserving chord shape.

In order to create a representation that

preserves tetrachord shape, we must abandon the central assumption that serves as the foundation of pitch-class encoding: octave equivalence. We must adopt an encoding that explicitly indicates that two different notes (e.g. C4, middle C, and C5, the C an octave higher than C4) do not belong to the same pitch-class, but are instead distinct pitches.

Pitch encoding is an alternative to pitch-

class encoding, and abandons the octave equivalence assumption. In pitch encoding, each input unit represents the presence or absence of a particular pitch, and not of a pitch-class. For example, C4 and C5 are encoded with different input units. This is illustrated in Figure 9-4, which shows the mapping between pitches (particular piano keys) and input units over a two octave range. Note that in Figure 9-4 the input units are labeled as representing particular pitches (C4, C5, etc.) instead of pitch classes.

Figure 9-4. The mapping of the input units

used for pitch encoding onto a 24-key piano keyboard.

In order to use pitch encoding to

represent all of the tetrachords in Figure 9-1 more than 24 input units are required to capture all of the pitches. In our version of the problem, the highest key of the progression was G#, and the highest note was C#6 (the highest note in the D#7 tetrachord for this key). Similarly, the lowest key of the progression was A, and as a result the lowest note that we used was A3 (the lowest note in the Ama7 tetrachord for this key). As a result our pitch encoding of chords when all chords were in root position required 29 input units which represented all of the pitches from A3 to C#6.



9.4.3 Pitch Encoding of Inversions Pitch encoding can be used to represent

tetrachords in the form in which they are presented in Figure 9-1. However, as soon as octave equivalence is abandoned other versions of the ii-V-I progression problem are possible.

For instance, imagine that someone was

interested in performing a ii-V-I progression on the piano. The Figure 9-1 score can certainly be performed on this instrument. However, a pianist might prefer alternative versions of the chords that reduce the hand and finger movement required when one moves from one chord to the next.

For instance, if one uses the second

inversion of every dominant seventh chord in the progression, then a ‘least action’ version of the progression emerges. The second inversion of a dominant seventh chord is created by taking the two lowest notes in the chord’s root position and raising each an octave. Figure 9-5 provides a version of the Figure 9-1 score in which each of the dominant seventh chords have been inverted. If one compares the Figure 9-5 score to the Figure 9-1 score, then the difference in shape between the dominant seventh tetrachords in each will be apparent.

Figure 9-5. The ii-V-I progression for each possible key. The score is identical to Figure 9-1 with the exception that all dominant seventh chords are all represented as second inversions. See text for

details.



How does inverting the middle chord of the ii-V-I progression enable least action movement for a pianist? Figure 9-6 illustrates voice leading – i.e. finger movements from one chord to the next – for the ii-V-I progression in the key of C to shed light on this issue.

The top three keyboards in Figure 9-6

illustrate the voice leading when the dominant seventh chord is in root position. The arrows indicate finger movements from chord to chord. Note that because the middle chord is in root position, substantial movement from chord to chord is required: each finger moves to a different key to play the next chord, and the hand must move up and then back down along the keyboard.

Figure 9-6. Voice leading for two versions of the ii-V-I progression. See text for details. The lower half of Figure 9-6 shows that if

the middle chord is played in second inversion form, much less movement is required. The hand stays at the same position along the keyboard, and moving from one chord to the next only requires changing the position of two fingers. Two

fingers press the same keys in successive chords for this version of the progression!

In short, an alternative approach to

encoding the ii-V-I progression problem is to use pitch encoding, but to also take advantage of its flexibility by presenting dominant seventh chords in their second inversion form. One consequence of this is that slightly fewer processing units are required; all of the tetrachords can be encoding using 24 input units with the lowest unit representing A3 and the highest unit representing G#5.

9.4.4 Lead Sheet Encoding All of the encodings that have been

described above represent each pitch-class or each pitch in a tetrachord. As a result, all involve activating four processing units, and turning all of the remaining processors off.

However, there are many other ways in

which tetrachords could be represented and some of these representations are not concerned with detailed each note in a chord.

For instance, one popular approach to

teaching adults how to play piano (Houston, 2004) attempts to simplify music reading by eliminating traditional musical notation of chords (notation like that found in Figures 9-1 and 9-5). Instead chords are represented in what is called lead sheet notation: they are simply written as a combination of the name of one note (to provide the chord’s root) and some additional symbols which indicate the type of chord. For instance if one was using lead sheet notation for the ii-V-I progression in the key of C, the chords would merely be written as ‘Dm7’, ‘G7’, and ‘Cmaj7’.

We can easily create a lead sheet

encoding for an artificial neural network that is to learn the ii-V-I progression. This encoding is very simple, and only requires 15 processors as is illustrated in Figure 9-7. Three of these processors are used to indicate a chord’s type, where only three chord types (m7, 7, maj7) are involved in the ii-V-I progression problem. The remaining twelve processors represent the chord’s root pitch using pitch-class encoding. For



example, Figure 9-7 demonstrates how the Dm7 tetrachord can be represented by only activating two units: the unit that represents that the chord is a minor seventh and the unit that indicates that the chord’s root is the pitch-class D.

Figure 9-7. Lead sheet encoding of the Dm7 tetrachord for an artificial neural network.

See text for details. 9.4.5 Summary and Implications The sections above have discussed four

different methods for encoding stimuli (and responses) for the ii-V-I progression problem. The first is pitch-class encoding which has been employed in previous chapters. It has the advantage of simplicity, requiring a reasonably small number of input units. However, it has the disadvantage of using different chord shapes to represents chords of the same type that come from different keys.

The second is pitch encoding, which

abandons octave equivalence and represents notes that belong to the same pitch-class, but to different octaves, with different processors. This encoding has the

disadvantage of requiring more processing units, but has the advantage of preserving chord shape across keys.

Third, because pitch encoding abandons

octave equivalence it also permits different chord inversions to be presented to the network. This raises permits us to explore the possibility that chord forms that are easier to play (because of their ‘least action’ shapes) may also be easier for a network to learn.

Finally, alternative encodings that are not

intent on representing every note in a chord can also be employed. One that was described for the ii-V-I progression problem is lead sheet encoding. This type of encoding has the disadvantage of not explicitly representing a chord’s pitches or its shape. However, it has the advantage of being extremely simple because any chord in a ii-V-I progression can be represented by simply activating two processing units.

With these possible encodings of the ii-V-

I progression described, we can now investigate the effect of problem encoding on network learning. Does problem representation affect network complexity? Does problem encoding alter the amount of training required for a network to solve the ii-V-I progression problem?



9.5 Simulations With Pitch-Class Encoding

Figure 9-8. A multilayer perceptron trained on the ii-V-I progression task. This network encoded the

input and output tetrachords using pitch-class representation. See text for details.

9.5.1 Task All of the networks to be described in the

remaining sections of this chapter learn the ii-V-I progression problem that was described in Section 9.2 using one of the encodings that were discussed in Section 9.4. The current section describes training an artificial neural network when the inputs and outputs of the ii-V-I progression problem are represented using pitch-class encoding.

9.5.2 Training Set The networks described in Section 9.5 all

use pitch-class encoding of the ii-V-I progression problem. The training set for this problem consists of 24 different input/output pairs, where each member of a pair is a particular tetrachord

9.5.3 Network Architecture As all of the networks described in this

section require 12 input units and 12 output units because pitch-class encoding was used. All of the output processors were value units that employ the Gaussian activation function.

Pilot studies were conducted to

determine whether a network that uses pitch-class encoding for the ii-V-I

progression problem requires hidden units, and if so, then how many? Results indicated that a network with 7 hidden value units could reliably converge on a solution to either the two-chord-per-key or the three-chord-per-key versions of the task. On occasion, a network with 6 hidden value units could solve the problem, but in most cases networks of this size failed to converge after 30,000 epochs of training or more.

As a result, we decided that a multilayer

perceptron with 12 output value units, 7 hidden value units, and 12 input units, was the most appropriate for learning the pitch-class version of either ii-V-I progression task. The structure of such a network is illustrated in Figure 9-8.

9.5.4 Training When a multilayer perceptron was

trained on the ii-V-I progression problem, the learning rate was 0.01, and connection weights were randomly initialized to values in the range from -0.1 to 0.1. All µs were started at zero, but were trained during learning. (When µs were held constant at 0 networks did not learn to solve the problem.) Typically a network solved this problem in between 3000 and 4000 epochs, where (as in previous chapters) convergence was



defined as generating a hit for every output unit on every training pattern.

To quantify network performance we

conducted a small experiment in which ten different multilayer perceptrons (our ‘subjects’) were trained to convergence using the architecture and training settings detailed above. Every one of these networks solved the problem. On average convergence was achieved after 3960.6 epochs of training (SD = 1196.1).

When the number of hidden units was

reduced from 7 to 6, and the network was trained using these settings, convergence was rarely achieved. However, on rare occasions a network was successful in learning the ii-V-I progression. When this occurred between 23,000 and 33,000 epochs of training were required.

9.5.5 Network Interpretation With converged networks for the ii-V-I

progression problem, the typical next step in our research program would be to interpret a network’s internal structure. There is no reason to believe that this could not be done for one of the converged networks that used pitch-class encoding. The fact that such a network has seven hidden units suggests that interpretation might be challenging and time consuming, but it is certainly tractable.

However, before attempting to do

interpret one of the networks trained above we could explore different networks that learn the same problem, but used different input/output encodings. This is because it is possible that a change in encoding might produce a network that is simpler and is therefore easier to interpret.

For our purposes, ‘simpler’ has an

objective definition: a network with fewer hidden units is simpler than a network with more hidden units. For instance, if a change of encoding permitted a network with only 4 hidden units to solve the ii-V-I progression problem, then this encoding makes the problem simpler than pitch-class encoding (which requires 7 hidden units, as illustrated in Figure 9-8). Furthermore, if a multilayer perceptron can be replaced by a perceptron

that has no hidden units, then the encoding has dramatically simplified the problem.

One of the key advantages of solving a

problem with a simpler network is that the internal structure of a simpler network should be easier to interpret. In addition, there may be some important theoretical issues that simpler networks permit to be addressed.

So, let us first explore the results of

training networks using different encodings of the ii-V-I progression problem before deciding on which network to interpret!



9.6 Simulations Using Pitch Encodings of Root Forms

Figure 9-9. A perceptron trained on the ii-V-I progression task. This network encoded the input and

output tetrachords using pitch representation of chords in root position. Every input unit is connected to every output unit; only a subset of these connections are illustrated in the figure. See

text for details. 9.6.1 Task The next networks to consider were also

trained on the ii-V-I progression problem. However, the difference between these networks and those discussed in Section 9.5 is that the current networks used pitch encoding (instead of pitch-class encoding), and encoded every tetrachord in root position.

9.6.2 Network Architecture Because all of the networks described in

this section used pitch encoding of chords in root position, they all employed 29 input and 29 output units. The lowest pitch represented by an input (or output) unit was A3, and the highest pitch represented by an input (or output) unit was C#6. All of the output processors were value units that employ the Gaussian activation function.

Importantly, pilot tests revealed that,

unlike the networks described in Section 9.5, no hidden units were required to solve either problem when it was encoded in this fashion. By changing the encoding of the input/output pairs it was now possible for a perceptron to discover a solution to either version of the ii-V-I problem! The network capable of solving the problems is illustrated in Figure 9-9.

9.6.3 Training Training proceeded with a learning rate

of 0.1, and connection weights started randomly in the range from -0.1 to 0.1. All µs were held at zero throughout learning. Typically a perceptron would converge on a solution to the problem in fewer than 80 epochs of training, where convergence was defined as generating a hit for every output unit on every training pattern. We conducted a small study in which ten different perceptrons were trained on this problem. All ten ‘subjects’ learned to solve the problem. With this encoding on average 63.7 epochs of training were required for a network to learn the ii-V-I progression (SD = 7.36).

9.6.5 Implications Though the networks were trained on the

same tasks, the choice of encoding had enormous impact. When pitch-class encoding was used multilayer perceptrons that contained 7 hidden units were required to solve the two versions of the problem, and did so after about 4000 epochs of training. In contrast, encoding the same problem in terms of pitches resulted in a much simpler network – a perceptron – that converged after only about 65 epochs of training.



9.7 Simulations Using Pitch Encodings of Inverted Forms9.7.1 Task The next networks to consider were also

trained on the ii-V-I progression problem using pitch encoding. However, the difference between these networks and those discussed in Section 9.6 is that the current networks took advantage of pitch encoding’s flexibility and encoded dominant seventh chords as second inversions. Minor seventh and major seventh chords were still encoded in root position. As discussed in Section 9.4.3 inverting the dominant seventh chords in this way produces ‘least action’ transformations between chords in the ii-V-I progression.

9.7.2 Network Architecture While the networks described in this

section use pitch encoding (as did the networks described in Section 9.6), using the second inversion of dominant sevenths meant that fewer input (and output) units were required to represent chords. The networks discussed in the current section only require 24 input and output units. The lowest pitch represented by a unit was A3, and the highest pitch represented by a unit was G#5. All other pitches between these two extremes were represented by an input (and output) processor. All of the output units were value units. Once again pilot studies revealed that a perceptron was capable of learning the ii-V-I progression with this representation of inputs and outputs.


of 0.1, and connection weights started randomly in the range from -0.1 to 0.1. All µs were held at zero throughout learning. Typically a perceptron would converge on a solution to the problem in fewer than 60 epochs of training, where convergence was defined as generating a hit for every output unit on every training pattern. We conducted a small study in which ten different perceptrons were trained on this problem. All ten ‘subjects’ learned to solve the problem. On average 46.7 epochs of

training were required for a network to learn the ii-V-I progression (SD = 7.82).

While the networks of the current section

and those of Section 9.6 are all perceptrons, it seems that the current networks converge after less training than did those who only faced chords in root position. We used an independent t-test to compare the performance of 10 ‘subjects’ trained under the conditions of Section 9.7.3 with the 10 networks that were discussed in Section 9.6.3. This test revealed that the current networks converged to a problem solution significantly faster than did the previous set of perceptrons (t = 5.005, df = 18, p < 0.001).

9.7.4 Implications In Section 9.4.3 it was argued that if

dominant seventh chords in the ii-V-I progression were in second inversion form then the progression is easier to play in the sense that ‘least action’ is possible. A pianist can move from chord to chord in the progression by only changing the position of two fingers when the middle chord is inverted in this way.

There is no reason to expect that the

potential for ‘least action’ would have any effect on network complexity or performance. This is because processing units do not map onto actions, particularly in the computations involved when networks learn or respond to inputs.

Perhaps not surprisingly network

complexity was not affected by using inverted chords, because a simple network – a perceptron – could learn the progression whether second inversions were used or not. More surprisingly, though, network training was affected by the presence of second inversions. Networks were able to take advantage of their presence to learn the progression significantly faster than did networks that were only presented chords in root position. Possible implications of this result are considered later in the chapter.



9.8 Simulations Using Lead Sheet Encodings

Figure 9-10. A perceptron that can learn the ii-V-I progression problem encoded in lead sheet format.

9.8.1 Task and Architecture The last networks to consider were

trained on the ii-V-I progression problem using the lead sheet encoding that was described in Section 9.4.4. These networks require 15 input and output units to represent chords using lead sheet notation. Three of these units represent chord type (minor seventh, dominant seventh, major seventh). The remaining 12 input units represent the root of the chord using pitch-classes. All of the output units were value units. Once again pilot studies revealed that a perceptron like the one in Figure 9-10 was capable of learning the ii-V-I progression with this representation of inputs and outputs.


of 0.1, and connection weights started randomly in the range from -0.1 to 0.1. All µs were initialized to a value of 0, but were modified during training. This is because these networks would not converge when all biases were held at zero. Typically a perceptron would converge on a solution to the problem almost immediately, requiring only 5 or 6 epochs of training to generate a hit for every output unit on every training pattern. We conducted a small study in which ten different perceptrons were trained on this problem. All ten ‘subjects’ learned to

solve the problem. On average only 5.8 epochs of training were required for a network to learn the ii-V-I progression (SD = 0.422).

9.8.3 Implications Once again these simulations reveal the

importance of exploring different encoding schemes. All of the networks that have been described in this chapter have been faced with learning the same input/output mappings. However, the ease with which these mappings are acquired depends dramatically on the choice of encoding. If pitch-class encoding is used to encode the four pitch-classes in each tetrachord, then a multilayer perceptron that contains 7 hidden value units is required to reliably achieve convergence in about 4000 epochs. In contrast, if the very simple lead sheet encoding is required, then a perceptron can learn the identical input/output mapping, and do so after only about 6 epochs of training.

We have now explored a wide range of

architectures for learning the ii-V-I progression under a variety of encodings. We have discovered that different encodings have profound impacts on both network complexity and on the amount of training required to learn the ii-V-I progression problem. Let us next turn to exploring the internal structure of a couple of these networks.



9.9 Interpreting A Lead Sheet Perceptron9.9.1 Encoding and Interpretation Earlier in this chapter we noted that there

were many different ways in which the same problem could be encoded for network training. The simulation results that were reported earlier revealed that the choice of encoding had an enormous impact on network complexity. In particular, with one encoding the ii-V-I progression problem could be solved with a value unit perceptron, while with another the same problem required a multilayer network of value units that included seven hidden units.

The choice of encoding also has

important implications for network interpretation. Of course, this is largely related to network complexity: if a particular encoding leads to a simpler network, then it is expected that such a network is easier to interpret. However, other factors are also at play.

For instance, one could use the property

‘abstractness’ to compare and contrast the pitch-class encoding described in Section 9.4.1 with the pitch encoding described in Sections 9.4.2 and 9.4.3.

With pitch-class encoding a tetrachord is

only defined by its component pitch-classes. As a result, it fails to make explicit some properties of chords that could be important. For instance, it was shown earlier in Figure 9-3 that this encoding eliminates information about the shape of a chord (i.e. the relative spacing between the chord’s notes on a staff or on a keyboard). Various chords of the same type have different shapes using this encoding.

In other words, the multilayer perceptron

of Figure 9-8 cannot learn the ii-V-I progression by simply learning to directly map the shape of an input chord into the shape of an output chord. Instead, the hidden units have to capture some more abstract property, which is why so many hidden units are required in the network. Obviously the seven-dimensional hidden unit space is capturing important musical properties, and in principle one could peer into this network to uncover what these

properties are. However, interpreting this network is a challenging task, a task made less palatable with the knowledge that simpler networks for the same problem are also available for interpretation!

A very different situation emerges with

pitch encoding. This encoding maps input and output units directly onto a piano keyboard, and therefore makes chord shape an explicit property of chord codes. In this sense, pitch encoding is more concrete: one could literally represent input and output units with piano keys (e.g. figure 9-4), revealing how they might actually be played.

Interestingly, this encoding might be too

concrete to reveal interesting musical properties. The fact that a perceptron can solve the ii-V-I problem when this encoding is used means that network interpretation reduces to examining the weights of direct connections between input and output units in the context of each output unit’s µ. One might expect that this would reveal a repeating pattern of connection weights that maps one chord shape into another.

However, when weights of trained

networks were examined such repeating patterns of connection weights were not found. Instead, the network’s structure was too concrete: weights were assigned in such a way that a particular output unit would turn on when a particular set of four input units were activated, and off to other input patterns. However, all of the weights were very specialized to each set of causal links. A general, repeated, pattern of connectivity was not discovered and exploited.

It is almost as if the perceptrons learned

to map individual notes to other individual notes without recognizing that patterns of notes belonged together in a more abstract category (e.g. as a tetrachord, or a tetrachord of a particular type, or a tetrachord as a particular shape). This would be analogous to teaching a pianist teaching a novice the ii-V-I progression by having them remember that when their fingers are here then next they will be there, but without bothering to teach them that they



the notes are related as entities called chords.

The lead sheet encoding that was also

explored seems to offer a compromise level of abstraction between the two types of encoding discussed above. On the one hand, it is an abstract encoding in the sense that it does not represent the individual notes of a chord, but instead makes explicit a chord’s root and its abstract type. On the other hand, the abstract properties that it makes explicit are not so abstract that a complicated multilayer perceptron is required to solve the ii-V-I progression. Indeed, when this type of network is interpreted some basic musical properties – in particular, the intervallic relationships between chord roots in the progression – are laid bare in a simple network structure.

The remainder of this section proceeds

as follows. First, we will provide an alternative account of the ii-V-I progression using the circle of perfect fifths. Second, we will examine the connection weights of a perceptron trained with lead sheet encoding to demonstrate that it mirrors this geometric account of this particular progression.

9.9.2 Geometry of the ii-V-I Earlier in this chapter it was noted that

one key task in establishing tonality was deciding upon which roots to use in each of a succession of chords. The ii-V-I progression is interesting because the progression of roots for its three chords in any key can be determined by following a particular map: the circle of perfect fifths.

This is illustrated in Figure 9-11. The top

circle in the figure arranges the twelve pitch-classes of Western music around the circle of perfect fifths, so that adjacent pitch-classes are a musical interval of a perfect fifth apart. The middle circle adds spokes to this circle, as well as chord names, to represent the three chords of the ii-V-I progression for the key of C major. Note how the three spokes pick out three positions adjacent to one another along the circle of perfect fifths. If one rotates the three spokes to a different position within the circle, then the roots of three chords for the same progression in a different musical key

are revealed. For example, the spokes in the bottom illustration of the figure reveal the progression for the key of B♭ major.

Figure 9-11. The circle of perfect fifths

provides a map between the roots of the three chords in the ii-V-I progression for any key.

See text for details. Figure 9-11 demonstrates that the circle

of perfect fifths can be used to map the transition from chord root to chord root in the ii-V-I progression. When this progression is encoded with lead sheet notation, it can be learned by a perceptron. One musical property that lead sheet encoding makes explicit is the root of each tetrachord. We might therefore expect to find that the circle of fifths is encoded in some fashion within the connection weights of this perceptron. Let us proceed with interpreting one of these perceptrons to determine whether or not this is indeed true.



9.9.3 Network Interpretation

Input Unit

Output Unit

m7 7 maj7 A A# B C C# D D# E F F# G G#

µ -0.68 0.33 -0.32 -0.53 0.55 -0.55 -0.51 0.57 0.55 0.58 0.54 0.57 0.54 0.56 0.54

m7 -0.35 -0.34 -0.65 -0.32 0.32 -0.32 -0.33 0.29 0.31 0.29 0.33 0.27 0.33 0.32 0.29

7 -0.39 0.68 0.36 -0.31 0.32 -0.33 -0.32 0.31 0.31 0.30 0.34 0.26 0.33 0.33 0.29

maj7 -0.09 -0.09 0.09 -0.06 -0.07 0.07 0.04 -0.02 0.08 -0.09 -0.09 -0.03 -0.03 -0.08 -0.06

A -0.07 0.04 -0.07 -0.11 0.16 -0.24 -0.19 0.08 -0.76 0.10 0.11 0.11 0.09 0.10 0.20

A# -0.04 -0.03 -0.08 -0.13 0.10 -0.07 -0.11 0.21 0.19 -0.78 0.06 0.16 0.07 0.05 0.09

B -0.03 -0.01 -0.03 -0.10 0.08 -0.09 -0.11 0.04 0.06 0.08 -0.78 0.12 0.09 0.08 0.10

C -0.17 -0.01 -0.06 -0.16 0.16 -0.10 -0.17 0.17 0.10 0.19 0.14 -0.72 0.10 0.06 0.21

C# -0.09 0.01 -0.03 -0.14 0.07 -0.09 -0.06 0.11 0.05 0.04 0.13 0.16 -0.77 0.08 0.12

D -0.03 0.01 0.00 -0.08 0.07 -0.12 -0.14 0.06 0.06 0.06 0.12 0.10 0.09 -0.79 0.13

D# -0.07 0.02 -0.02 -0.11 0.04 -0.14 -0.10 0.12 0.11 0.12 0.08 0.10 0.10 0.06 -0.76

E 0.01 0.03 -0.03 0.76 0.06 -0.09 -0.15 0.06 0.11 0.07 0.05 0.16 0.11 0.03 0.11

F -0.17 0.06 -0.08 -0.14 -0.77 -0.09 -0.15 0.05 0.17 0.03 0.07 0.09 0.09 0.04 0.04

F# -0.05 0.04 -0.03 -0.16 0.14 0.77 -0.14 0.20 0.18 0.18 0.09 0.17 0.14 0.17 0.16

G -0.01 0.07 -0.01 -0.12 0.07 -0.12 0.75 0.09 0.16 0.09 0.05 0.18 0.13 0.13 0.10

G# -0.17 -0.01 -0.07 -0.16 0.29 -0.06 -0.23 -0.73 0.17 0.11 0.19 0.23 0.17 0.18 0.17

Table 9-3. The connection weights for a perceptron that has learned the ii-V-I progression in lead sheet notation. Each row corresponds to an input source (µ or an input unit) and each column

corresponds to an output unit. A perceptron that learns the ii-V-I

progression using the lead sheet encoding has 225 modifiable connection weights (because each of its 15 input units is connected to each of its 15 output units) as well as 15 different µs (one for each output value unit). The value for each of these 240 components for one network that learned the progression after 6 epochs of training is provided in Table 9-3. Stored within this table of numbers is this particular perceptron’s knowledge of the ii-V-I changes. Fortunately, an inspection of Table 9-3 indicates the presence of many patterns that permit the network’s structure to be simplified for interpretation; these key elements are highlighted in the table.

First, there is a distinct pattern of

connection weights between pairs of input and output units that represent chord types. Two of these input units (for minor seventh and dominant seventh chords) have two moderate weights to chord type output units (with values around ±0.35) and one more extreme connection weight to a third chord-

type output unit (a value around ±0.68). In contrast, the input unit for major seventh chords has a near zero connection weight to each of the three chord-type output units.

Second, there is a repetitive pattern of

connection weights between input units that represent chord types and output units that represent pitch-classes. In particular, minor seventh and dominant seventh chord type input units have nearly identical connection weights to the same output pitch-class unit, and all of these weights have a value around ±0.32). In contrast, the major seventh input unit has a near zero connection weight to any pitch-class output unit.

Third, each of the output units that

represent a pitch-class has only one incoming connection weight that has an extreme value (around ±0.74). Importantly this weight comes from the input unit that represents a pitch-class that is a perfect fifth away from the output unit’s pitch-class.



Fourth, the remaining connection weights between input and output units that represent pitch-class are either near zero in value, or have a relatively small value (±0.17). For the purpose of network interpretation this means that these weights can be ignored, because their values are such that a signal sent through them will not result in the output unit turning on.

In short, the behavior of this particular perceptron can be simplified by focusing on the subset of connection weights that are involved in activating output units when a tetrachord is presented to the network. Only a handful of connection weights are involved in converting an input pattern into an output response. This ‘functional pattern of connectivity’ for the Table 9-3 network is illustrated in Figure 9-12.

Figure 9-12. The functional pattern of connectivity for a perceptron that has learned the ii-V-I

progression using lead sheet encoding. The connection weights and values for µ in this figure are taken from Table 9-3. See text for details.

Figure 9-12 illustrates the input and the

output units for the perceptron whose full set of connection weights are provided in Table 9-3. The number inside each of the output units is that processor’s µ. Note that none of these values are equal to 0. As a result, a slightly different account of what will turn an output unit on is required. In the Figure 9-12 network, an output unit will only turn on when the net input that it receives ‘cancels out’ its µ. That is, if a value unit has a bias

equal to µ, then it will only turn on when its net input equals -µ. Note many of the networks described in previous chapters – whose output units (with µ = 0) turned on when their net input equals 0 – follow a special case of this more general rule because 0 = -0.

In Figure 9-12, the order of pitch-class

units in the input layer is different than that for the output units. The units have been



rearranged in the figure so that a particular input unit is directly below the output unit that represents a pitch-class a perfect fifth away from the input unit’s.

The number in the rectangle directly

above each input pitch-class unit is the weight of the connection between the input unit and the output pitch-class unit directly above it in the figure. A vertical line in the figure indicates the existence of this connection from an input unit to an output unit.

The input units for minor seventh and

dominant seventh chords have important connections with each output pitch-class unit; the weight of this connection (from either of these two different input units) is provided in each rectangle directly below an output pitch-class unit. Note that the major seventh input unit has no such connections; indeed, it has no functional purpose in this network and therefore there are no connections drawn from this input unit to any of the output units!

There are important connections from the

minor seventh and the dominant seventh input units to each of the three output units for chord type. Each of these connections has been drawn in Figure 9-12, but the weight values have not been included in the figure to avoid clutter. These values are presented in Table 9-3 and their function will be described shortly.

With properties of Figure 9-12 described,

we are now in a position to explain how this perceptron operates.

To begin, let us consider the conditions

that will cause a particular output unit to turn on when this output unit represents the root note of the output chord. Such an output unit will only activate when 1) either the m7 or D7 input unit is on and 2) the input unit representing a pitch-class that is a perfect fifth away from the output unit’s is also on. It is only in such a circumstance that the net input to the output unit will be equal to -1 times its value of µ, cancelling µ out.

For instance, consider the output unit

representing the pitch-class A, which has µ = -0.53. When activated the m7 input unit or

the D7 input unit will send a signal of -0.32 to this output unit. When the E input unit – which is a perfect fifth away from A – is activated, it will send a strong signal of 0.76 to the A output unit. The two signals being received by this input unit in this situation (-0.32 and 0.76) sum to a total of 0.44 which, when combined with µ produces a net input of -0.09. This net input is close enough to zero to produce high activity in the output unit (0.97) when the Gaussian activation function is employed. A similar account for any output pitch-class unit can be extracted from Figure 9-12.

In order to generate a complete response

in the ii-V-I progression problem, the network must also activate one of its chord type output units. This is accomplished via the signals coming from the same types of units in the input pattern. In general, the presence of one chord type in the input results in a different chord type being generated in the output. The connection weights in Table 9-3 reveal that the strength of the connection from one chord type to the next is almost equal to -µ, where µ is the bias of the output unit. For instance, the weight from the m7 input unit to the D7 output unit is -0.34, while the D7 output unit has µ = 0.33. The other connection weights between input and output chord types are such that an input chord unit will only activate the appropriate next chord type, and will fail to activate the other two (incorrect) chord type units.

Figure 9-13 illustrates the causal

relations that we have just described for the network to the geometric description of the ii-V-I progression that was developed in Section 9.9.2. The top of this figure displays three chord type units; the arrows between then indicate the causal links between units (i.e. each arrow shows that input activity in the unit at its base causes output activity in the unit at its arrowhead). So, when the m7 unit is turned on, it causes output activity in the D7 unit. Similarly, when the D7 unit is turned on, it causes output activity in the Maj7 unit. The Maj7 unit causes no activity in any other units, which is why no arrows emanate from it.

Activity in either the m7 or the D7 unit

also sends activity to pitch-class units. This



is represented in Figure 9-13 by having a second arrow from each unit connect them to an apex of a triangle which in turn sends signals to pitch-class units. Both are connected to the same apex because (as shown earlier in Figure 9-12 and Table 9-3) both of these input units have essentially the same connection weight to a pitch-class unit. The arrow from the apex to each pitch-class unit indicates the role that either the m7 or the D7 unit plays in turning an output pitch-class unit on.

The pitch-classes in Figure 9-13 are

arranged around a circle of perfect fifths. Arrows around this circle indicate causal links from an input pitch-class to an output pitch-class. For instance, for the A unit to turn on, it must receive a signal from the E unit which is adjacent to it in the circle. No other pitch-class unit will turn A on.

However, A will only turn on if it receives a signal from E and from either the m7 or the D7 unit at the same time.

In short, when lead-sheet notation was

used to encode stimuli and responses, a perceptron learned to carve the ii-V-I problem up into two different tasks: the causal relations between input and output chord types, and the causal relations between input and output pitch-classes. Its solution to the problem identified the fact that pitch-class relations were organized in terms of perfect fifths, but also required input from appropriate chord types. The solution also recognized that chord-type relations were independent of chord roots. What is amazing is that such an elegant solution was achieved after such a small amount of training!

Figure 9-13. Causal links between chord type units and pitch-class units. These causal links are taken from Figure 9-12 and Table 9-3, but are illustrated here using the circle of perfect fifths. See

text for details.



9.10 A Progression of Progressions

9.10.1 Second-order Progression The ii-V-I progression problem that we

have been discussing throughout the current chapter is the simplest version. One interesting property of this progression is that it is fairly easy to play the progression in one key, and then play the same progression in a key that is a full tone lower. As a result, one can perform a progression of progressions, changing key until one reaches the same key that was played at the beginning, although now the chords will be an octave lower than the first ones played. This might be called a second order progression.

The reason that one can have such a

progression of progressions is because it is fairly straightforward to change from the major seventh chord that ends the ii-V-I progression to a minor seventh chord that begins the progression in the next key. This is because the two chords are built upon the same root.

This is illustrated geometrically in Figure

9-14. Each circle in this figure arranges chord roots around a circle of perfect fifths. The three spokes in the top circle pick out the three chords for the ii-V-I progression in the key of C, a progression that ends with Cmaj7. One can take this major seventh chord and change two of its notes to produce a Cm7 chord. This chord is the first part of the ii-V-I progression for the key of B♭ major. The three chords for the ii-V-I progression in B♭ major are identified by the three spokes in the bottom circle of Figure 9-14.

Note that Figure 9-14 illustrates that

moving from the ii-V-I progression in one key to the same progression in the next key involves taking three spokes that pick out the first three chords and rotating them counterclockwise by 60. If the bottom set of spokes is rotated in the same direction by

the same amount they will point to the ii-V-I progression chords in the next key (A♭ major). Six rotations such rotations from the beginning position (the top of Figure 9-14) will return them to their original location.

Figure 9-14. Geometric illustration of the relation between two ii-V-I progressions in

adjacent keys. See text for details. The musical score illustrated in Figure 9-

15 provides the second-order progression that is created by beginning with the top chords illustrated in Figure 9-14 (the ii-V-I progression for C major) and repeatedly employing the 60 rotation rule until the key of C major is reached again. Notice that the first three chords in the Figure 9-15 score are each an octave higher than their respective chords at the end of the score.



Figure 9-15. The progression of ii-V-I progressions created by starting in the key of C major and

following the procedure illustrated in Figure 9-14 to move from key to key. An inspection of Figure 9-15, as well as a

consideration of the procedure illustrated in Figure 9-14, reveals that the method for producing a second-order ii-V-I progression only produces the progression for half of the available major keys. Indeed, the different keys for which this method generates chords all belong to the same circle of major seconds; none of the keys that belong to the other circle of major seconds have their chords generated. In order to do so, the

initial position of the spokes in Figure 9-14 must be changed to pick out the chords for a key that belongs to the other circle of major seconds (e.g. D♭ major). When this is done, and the 60 rotation procedure is implemented, the chords that belong to the remaining six major keys are generated. This second version of the second-order ii-V-I progression is presented in the musical score found in Figure 9-16.

Figure 9-16. The progression of ii-V-I progressions created by starting in the key of D♭ major and

following the procedure illustrated in Figure 9-14 to move from key to key. This progression generates the chords for the six major keys that are not represented in Figure 9-15.



9.10.2 Second-order Problem

The creation of the two versions of the second-order ii-V-I progression permits us to create a slightly more complicated version of the ii-V-I progression problem to be learned by artificial neural networks. In this second-order problem, the task for the network is the same: when presented an input chord, generate the next chord in the progression. However, the second-order version of the problem permits major seventh chords to be inputs that result in the generation of an output chord: the minor seventh chord that begins the ii-V-I progression in the next key. This is really the only difference between this new version of the problem and the simpler version that has been the subject of earlier sections in the current chapter.

In the old (first-order) version of the

problem, the major seventh chord was only a response, and never a stimulus. Similarly, the minor seventh chord was only a stimulus, and never a response. As a result the first-order version of the problem only required 24 patterns in its training set.

In the second-order version of the ii-V-I

problem, there is an additional stimulus, because a major seventh chord input now leads to a minor seventh chord response. As a result, the second-order ii-V-I progression problem has a total of 36 training patterns instead of 24.

Apart from an additional 12 stimuli, the

second-order version of the ii-V-I is nearly identical to the first-order version. In particular, input and output chords are treated in the same fashion, and can be encoded in the various formats that were detailed in Section 9.4. We used these encodings to create four different versions of the second-order ii-V-I progression problem, and then determined how problem encoding impacted network complexity.

9.10.3 Training Results This section provides a brief account of

the results of training networks on different encodings of the second-order ii-V-I progression problem. The results presented below are intended to complement the more detailed results for training networks on the

first-order version of the problem that were presented earlier this chapter.

When pitch-class encoding was

employed, the second-order ii-V-I problem was more complicated than the first-order problem. In order to achieve reliable and relatively fast convergence, two more hidden value units had to be added to the multilayer perceptron illustrated in Figure 9-8. With 9 hidden units, with µs trained during learning, and with a learning rate of 0.01, a solution to the problem was typically achieved in between 4500 and 6500 epochs of training. If the number of hidden units was reduced to 8, then the network typically failed to converge after more than 25,000 epochs of training, and a smaller learning rate (0.005) was required to achieve some progress. On occasion an 8 hidden unit network would converge after a larger number of training sweeps (at least 23,000), and even more rarely a network would converge after less than 8,000 sweeps. It would appear that an 8 hidden unit network would only converge if its randomly selected starting state was highly advantageous.

Another version of the second-order ii-V-I

progression problem was encoded using a pitch representation of tetrachords in root position. Similar to the results for the first-order problem (Section 9.6), this encoding permitted a value unit perceptron to learn a solution with output unit µs held constant at zero throughout learning. With a learning rate of 0.1 this kind of simple network would typically learn a solution to the problem in between 100 and 200 epochs of training.

Interestingly, when pitch encoding was

used to represent inverted chords in the second-order ii-V-I problem, the problem was more difficult than was the case for the first-order version of the problem. In contrast to the situation in which non-inverted chords were presented, a value unit perceptron was not able to learn a solution to the problem. This was somewhat surprising because we expected that the inverted chords would be easier to learn. The simplest network that would learn the second-order problem was a multilayer perceptron that had a single hidden unit, and also had direct connections between input and output units. A complete account of



why inverted chords cause problems for the second-order problem, but not for the first, would require a detailed analysis of the internal structure of networks. However, such an analysis will not be presented here.

The final representation that we

examined for the second-order ii-V-I progression problem was lead sheet encoding. As was the case for the first-order problem, this kind of encoding led to fast solutions by simple networks. With a learning rate of 0.1 and with µs modified

during learning a value unit perceptron would learn the second-order problem after approximately 40 epochs of training. This suggests that even with this encoding the second-order problem was more difficult than the first-order problem, in the sense that slightly more training was required. However, solutions to either version of the ii- V-I progression problem could be discovered by a value unit perceptron – when lead sheet encoding was employed.

9.10.4 Network Interpretation

Input Unit

Output Unit

m7 7 maj7 A A# B C C# D D# E F F# G G#

µ 0.59 0.55 -0.54 -0.92 0.92 0.94 0.96 -0.96 -0.93 -0.92 0.96 0.98 -0.96 -0.89 0.90

m7 0.60 -0.59 -0.61 -0.60 0.61 0.62 0.60 -0.59 -0.61 -0.65 0.61 0.60 -0.59 -0.66 0.66

7 0.59 0.62 0.59 -0.60 0.61 0.62 0.60 -0.59 -0.61 -0.65 0.60 0.60 -0.58 -0.66 0.66

maj7 -0.62 0.61 -0.62 0.26 -0.25 -0.25 -0.27 0.28 0.25 0.22 -0.27 -0.27 0.28 0.21 -0.21

A 0.03 0.05 -0.05 0.60 0.33 0.30 0.31 -0.33 1.54 -0.32 0.29 0.27 -0.32 -0.34 0.32

A# 0.03 0.05 -0.05 -0.31 -0.60 0.33 0.31 -0.31 -0.31 1.57 0.35 0.29 -0.33 -0.30 0.32

B 0.03 0.05 -0.05 -0.32 0.32 -0.62 0.29 -0.31 -0.31 -0.29 -1.57 0.28 -0.33 -0.30 0.30

C 0.03 0.05 -0.05 -0.34 0.32 0.32 -0.63 -0.31 -0.32 -0.27 0.32 -1.57 -0.31 -0.32 0.30

C# 0.03 0.05 -0.05 -0.37 0.34 0.32 0.32 0.62 -0.33 -0.31 0.31 0.31 1.54 -0.31 0.33

D 0.03 0.05 -0.05 -0.32 0.34 0.32 0.30 -0.32 0.61 -0.31 0.29 0.29 -0.32 1.55 0.31

D# 0.03 0.05 -0.05 -0.32 0.35 0.32 0.30 -0.33 -0.30 0.64 0.30 0.28 -0.32 -0.31 -1.55

E 0.03 0.05 -0.05 1.53 0.33 0.33 0.31 -0.34 -0.32 -0.29 -0.63 0.28 -0.33 -0.30 0.30

F 0.03 0.05 -0.05 -0.33 -1.53 0.32 0.32 -0.31 -0.31 -0.31 0.40 -0.64 -0.34 -0.33 0.30

F# 0.03 0.05 -0.05 -0.32 0.34 -1.56 0.34 -0.30 -0.32 -0.28 0.31 0.32 0.61 -0.32 0.31

G 0.03 0.05 -0.05 -0.32 0.35 0.32 -1.56 -0.33 -0.32 -0.30 0.29 0.31 -0.31 0.62 0.30

G# 0.03 0.05 -0.05 -0.32 0.33 0.31 0.29 1.55 -0.33 -0.30 0.31 0.28 -0.32 -0.30 -0.62

Table 9-4. The connection weights for a perceptron that has learned the second-order ii-V-I progression in lead sheet notation. Each row corresponds to an input source (µ or an input unit)

and each column corresponds to an output unit. In order to complete the parallels

between our earlier examination of the first-order ii-V-I progression problem and the current consideration of the second-order ii-V-I problem, let us proceed with an interpretation of the perceptron’s structure for solving the second-order problem when lead sheet encoding is employed.

Table 9-4 presents the connection

weights of one such perceptron. As was the case when we examined Table 9-3, within all of the connection weights in Table 9-4 there is a tractable subset of connection

weights that are functionally important; these weights have been highlighted in the table.

An examination of Table 9-4 reveals that

it shares a great deal of the functional structure seen earlier in Table 9-3, structure that was used to create Figure 9-12. First, the most extreme weight feeding into an output pitch-class unit comes from an input pitch-class unit that is a perfect fifth away. Second, the connections to these output units from either of the m7 or D7 input units are equal in weight. Third, the sum of the signal from an m7 or a D7 unit plus a signal



from an input pitch-class unit a perfect fifth away is sufficient to nearly cancel out the output unit’s µ. In short, this perceptron is structured to respond to input minor seventh or input dominant seventh chords in exactly the same way that was illustrated earlier in Figure 9-12.

The differences between Tables 9-3 and

9-4 reveal that the perceptron trained on the second-order ii-V-I problem has additional important connection weights that permit it to respond correctly when major seventh chords are presented to it.

First, the next most extreme connection

weight that feeds into an output pitch-class unit comes from an input pitch-class unit that represents the same pitch-class (i.e. the input is an interval of perfect unison away from the output). For example, the connection between the input unit representing A to the output unit representing A has a weight of 0.60. This relation makes sense because in the second-order version of the progression, a major seventh chord leads into a minor seventh chord that has the same root note.

Second, the connection weights from the

major seventh input unit to each of the output pitch-class units are now moderately large (in comparison to the same kinds of weights in Table 9-3), and are the same sign as the connection weights to the same output unit from the input pitch-class unit that represents the same pitch. As a result, the two signals – one from the major seventh input unit, the other from a pitch-class unit – combine to create a more extreme signal. Finally, this combination sums to a value that nearly cancels out the output pitch-class unit’s µ, turning it on.

For example, consider the output pitch-

class unit for A with µ of -0.92. The weight to it from the input A unit is 0.60, and the weight to it from the input major seventh unit is 0.26. When these two input units are turned on they together send a total signal of 0.86 which combines with µ to create a net input of -0.06 which is close enough to zero to turn the output unit on.

Third, the connection weights from the input major seventh chord unit to each of the three output chord type units are now all substantially different than zero. The weights are such that when the major seventh input unit is on it will turn on the minor seventh input unit, and will fail to activate the other two chord type units.

In short, the perceptron whose structure

is detailed in Table 9-4 is functionally identical to the perceptron of Table 9-3, but includes additional weights. These weights provide additional functionality that produces correct responses when major seventh chords are presented to the network.

As was illustrated in Figure 9-13, the

functional operation of the perceptron separates chord type responses from chord root responses.

With respect to the activation of an

output chord type unit, each of these output units will respond only when a particular input chord type unit is turned on. A minor seventh chord output will only be activated by a major seventh chord input. A dominant seventh chord output will only be activated by a minor seventh chord input. A major seventh chord output will only be activated by a dominant seventh chord input.

With respect to chord root, only three

different situations will cause an output pitch-class unit to turn on. First, it will turn on if a minor seventh chord unit is on at the same time that the input pitch-class unit a perfect fifth away is activated. Second, it will turn on if a dominant seventh chord unit is on at the same time that the input pitch-class unit a perfect fifth away is activated. Third, it will turn on if a major seventh chord is on at the same time that the input pitch-class unit a unison away is activated.

Of these three rules, the first two are

identical to those found in the perceptron for the first-order ii-V-I progression problem. This new perceptron solves the second-order ii-V-I problem by discovering that structure, and adding a small amount of additional functionality to deal with the progression of progressions.



9.11 Summary and Implications At the start of this book (e.g. Chapter 2,

Chapter 4, Figure 4-1) artificial neural networks were introduced as artifacts that are primarily used for pattern classification. That is, they arrange input patterns as points in a space (either a pattern space or a hidden unit space, depending upon network type), and output units carve this space into decision regions. If a pattern falls into one decision region, the network generates one kind of response (i.e. one kind of ‘pattern name’); if it falls into a different decision region, a different response is generated.

In earlier chapters we have

demonstrated that pattern classification is a general ability that can be applied very neatly to a variety of musical problems. For example, we have used it to identify scale tonics, scale modes, musical keys, and chord types.

The current chapter has demonstrated a

further flexible use of pattern classification in which the response generated by a network to an input chord is a special name: the name of another chord. This permits a network to represent chord progressions in its internal structure. We demonstrated this ability by training networks on two different versions (first-order and second-order) of an important chord progression, the ii-V-I changes.

In addition to demonstrating this ability,

this chapter also explored the importance of how one encodes network stimuli and responses. All of the networks described in this chapter learned the same chord progression. However, networks differed from one another in how input and output chords were encoded. One of the main results of the current chapter was that choice of encoding had enormous impact on problem complexity.

In particular, we discovered that when

the ii-V-I progression is encoded using the very abstract pitch-class representation of individual chord notes, the problem was very difficult. Multilayer perceptrons with several hidden value units were required to converge to a solution when this encoding was employed.

In contrast, other encodings of the ii-V-I

progression problems permitted much simpler networks to learn the problem. For instance, we discovered that pitch encoding of chords, and of chord inversions, led to very simple networks (in most cases perceptrons) solving the same problem that required a multilayer network when the more abstract encoding was used. A lead sheet encoding for both first-order and second-order problems also was quickly solved by a perceptron. The structure of these perceptrons was easy to analyze, and was easily related to a traditional geometric account of the ii-V-I progression.

The purpose of the current chapter was

simply to illustrate the importance of encoding choices. However, it is important to keep in mind the implications of such choices.

Obviously problem difficulty is impacted

by problem encoding. What encoding, then, should we choose for our networks? It might be very tempting to explore a variety of different and plausible encodings, and then to choose the one that generates the simplest networks.

In some cases this might very well be the

appropriate strategy. However, other factors must also be considered when choosing an encoding.

For example, perhaps the goal of a

network is to provide insight into the formal regularities that govern a specific musical problem. In this case, the encoding that leads to the simplest network may not be the most appropriate, because the encoding may make certain musical regularities disappear. We saw earlier in this chapter that one key element of the musical theory of chord progressions is voice leading. The lead sheet notation described in this chapter generates simple networks, but essential properties related to voice leading are hidden by this encoding. So, if one is interested in using networks to explore regularities of voice leading, then the encoding that leads to the simplest network may not be the most appropriate.



As another example, perhaps the goal of

training a musical network is to discover representations that serve as the basis for musical cognition. In this case, we may not be searching for the encoding that produces the simplest networks. We might be searching for the encoding that generates the greatest similarity between various measures of network performance and structure and measures of performance of human listeners in a musical cognition experiment.

From the perspective of musical

cognition, human listeners are ‘black boxes’. This is because we cannot directly observe the internal structures and processes that mediate musical cognition. Instead we can only infer these internal properties on the basis of observations of external behavior. This process of inference is called reverse engineering: by observing human responses to musical stimuli in a variety of clever experimental situations, we attempt to discover the structures, processes, or algorithms inside the black box.

Reverse engineering is hard enough

because we cannot directly see inside the black box. A second issue that makes reverse engineering challenging is that each input/output or stimulus/response pairing that we can observe can be mediated by more than one process. There is a many-to-one mapping from possible structures, processes, or algorithms to input/output relations (Dawson, 2013). As a result, we might believe that one process is responsible for mediating the behavior that we observe, but in reality a very different process might be responsible. What is required are some special observations that might be useful for validating one theory about what is inside the black box from another.

Fortunately, black boxes will generate

some observable behaviors that are side effects of the processes inside the black box. These side effects – called artifacts by Dawson (2013) -- can provide critical information for theory validation (Pylyshyn, 1980, 1984).

For instance, one consequence of representing a problem in a particular format might be that some instances of the problem can be solved quickly, while other instances are more difficult to solve. In performing mental arithmetic, for example, one might expect that if numbers were represented mentally in columns then addition problems that require carrying digits from one column to another would take longer than problems that did not require this operation. One can collect relative complexity evidence (Pylyshyn, 1984) to investigate artifacts of this type. With relative complexity evidence, one varies the nature of problems presented to a system, and then explores the relationship between the properties of the problems and the time required to solve them.

A related kind of data concerns

intermediate state evidence (Pylyshyn, 1984). This kind of evidence presumes that information processing inside the black box requires a number of different processing stages, and that each stage might represent intermediate results in a different format. To collect intermediate state evidence, one attempts to determine the number and nature of these intermediate results. For example, when researchers determined that items in short-term memory were confused with similar sounding items (Conrad, 1964) and not with items with similar meaning, this suggested that an intermediate memory store used an acoustic encoding (Waugh & Norman, 1965).

A particular type of data, called error

evidence (Pylyshyn, 1984), is very well suited to determining intermediate states. When extra demands are placed on a system’s resources, it may not function as designed, and its internal workings are likely to become more evident (Simon, 1969). This is not just because the overtaxed system makes errors in general, but because these errors are often systematic, and their systematicity reflects the underlying representation. One study (Yaremchuk & Dawson, 2005) investigated an multilayer perceptron trained to identify tetrachord types. It was discovered that when some of its hidden units were removed, the network only made very specific errors: it failed to identify tetrachords



as being major when, and only when, they were in their second inversion form. This suggested that the role of the missing hidden units was to permit the network to deal with this rather specialized type of input.

What is the relationship between relative

complexity evident, intermediate state evidence, error evidence, and choice of encoding? In many cases researchers are specifically interested in using artificial neural networks to serve as models of human musical cognition (Griffith & Todd, 1999; Todd & Loy, 1991). In this case establishing the validity of the model likely requires collecting all three types of evidence, not only from the human subjects, but also from the neural network model. The hope is to find a close relation between the evidence collected from the human subjects and the evidence collected from the neural network model. Importantly, this match is likely to be highly related to choice of encoding. In other words, a music cognition researcher may not be interested in seeking the encoding that leads to the simplest network, but instead in seeking the encoding that leads to the best match between subject and model.



9.12 ReferencesBharucha, J. J. (1984). Anchoring effects in

music: The resolution of dissonance. Cognitive Psychology, 16(4), 485-518.

Broze, Y., & Shanahan, D. (2013). Diachronic changes in jazz harmony: A cognitive perspective. Music Perception, 31(1), 32-45. doi: 10.1525/mp.2013.31.1.32

Calvo, P., & Gomila, A. (2008). Handbook Of Cognitive Science: An Embodied Approach. Oxford: Elsevier.

Chemero, A. (2009). Radical Embodied Cognitive Science. Cambridge, Mass.: MIT Press.

Conrad, R. (1964). Information, acoustic confusion, and memory span. British Journal of Psychology, 55, 429-432.

Dawson, M. R. W. (2013). Mind, Body, World: Foundations Of Cognitive Science. Edmonton, AB: Athabasca University Press.

Dawson, M. R. W., Dupuis, B., & Wilson, M. (2010). From Bricks To Brains: The Embodied Cognitive Science Of LEGO Robots. Edmonton, AB: Athabasca University Press.

Demsey, D. (1991). Chromatic third relations in the music of John Coltrane. Annual Review Of Jazz Studies, 5, 145-180.

Dourish, P. (2001). Where The Action Is: The Foundations Of Embodied Interaction. Cambridge, Mass.: MIT Press.

Gibson, J. J. (1979). The Ecological Approach To Visual Perception. Boston, MA: Houghton Mifflin.

Griffith, N., & Todd, P. M. (1999). Musical Networks: Parallel Distributed Perception And Performace. Cambridge, Mass.: MIT Press.

Heidegger, M. (1927/1962). Being And Time. New York,: Harper.

Houston, S. (2004). Play Piano In A Flash! (1st Hyperion ed.). New York: Hyperion.

Jarvinen, T. (1995). Tonal hierarchies in jazz improvisation. Music Perception, 12(4), 415-437.

Josephson, M. (1961). Edison. New York: McGraw Hill.

Katz, B. F. (1995). Harmonic resolution, neural resonance, and positive

affect. Music Perception, 13(1), 79-108.

Kelley, R. D. G. (2009). Thelonious Monk: The Life and Times of an American Original (1st Free Press hardcover ed.). New York: Free Press.

Krumhansl, C. L. (1990). Cognitive Foundations Of Musical Pitch. New York: Oxford University Press.

Krumhansl, C. L., Bharucha, J. J., & Kessler, E. J. (1982). Perceived harmonic structure of chords in three related musical keys. Journal of Experimental Psychology: Human Perception and Performance, 8(1), 24-36.

Levine, M. (1989). The Jazz Piano Book. Petaluma, CA: Sher Music Co.

Newell, A., & Simon, H. A. (1972). Human Problem Solving. Englewood Cliffs, NJ: Prentice-Hall.

Norman, D. A. (1998). The Invisible Computer. Cambridge, Mass.: MIT Press.

Norman, D. A. (2002). The Design Of Everyday Things (1st Basic paperback. ed.). New York: Basic Books.

Norman, D. A. (2004). Emotional Design: Why We Love (Or Hate) Everyday Things. New York: Basic Books.

Piston, W. (1962). Harmony (3d ed.). New York,: W. W. Norton.

Porter, L. (1998). John Coltrane: His Life And Music. Ann Arbor: University of Michigan Press.

Pylyshyn, Z. W. (1980). Computation and cognition: Issues in the foundations of cognitive science. Behavioral and Brain Sciences, 3(1), 111-132.

Pylyshyn, Z. W. (1984). Computation And Cognition. Cambridge, MA.: MIT Press.

Rosner, B. S., & Narmour, E. (1992). Harmonic closure: Music theory and perception. Music Perception, 9(4), 383-411.

Schoenberg, A. (1969). Structural Functions Of Harmony (Rev. ed.). New York,: W. W. Norton.

Shapiro, L. A. (2011). Embodied Cognition. New York: Routledge.



Shapiro, L. A. (2014). The Routledge Handbook Of Embodied Cognition (1 edition . ed.). London: Routledge.

Simon, H. A. (1969). The Sciences of the Artificial. Cambridge, MA: MIT Press.

Slonimsky, N. (1947). Thesaurus of scales and melodic patterns. New York,: Coleman-Ross company, inc.

Steedman, M. J. (1984). A generative grammar for jazz chord sequences. Music Perception, 2(1), 52-77.

Sudnow, D. (1978). Ways Of The Hand: The Organization Of Improvised Conduct. Cambridge, Mass.: Harvard University Press.

Todd, P. M., & Loy, D. G. (1991). Music And Connectionism. Cambridge, Mass.: MIT Press.

Tymoczko, D. (2006). The geometry of musical chords. Science, 313(5783), 72-74.

Tymoczko, D. (2008). Scale theory, serial theory and voice leading. Music Analysis, 27(1), 1-49. doi: 10.1111/j.1468-2249.2008.00257.x

Tymoczko, D. (2011). A Geometry Of Music: Harmony And Counterpoint In The Extended Common Practice (E-pub ed.). New York: Oxford University Press.

Varela, F. J., Thompson, E., & Rosch, E. (1991). The Embodied Mind: Cognitive Science And Human Experience. Cambridge, Mass.: MIT Press.

Vera, A. H., & Simon, H. A. (1993). Situated action: A symbolic interpretation. Cognitive Science, 17, 7-48.

Waugh, N. C., & Norman, D. A. (1965). Primary memory. Psychological Review, 72, 89-104.

Winograd, T., & Flores, F. (1987). Understanding Computers And Cognition. New York: Addison-Wesley.

Yaremchuk, V., & Dawson, M. R. W. (2005). Chord classifications by artificial neural networks revisited: Internal representations of circles of major thirds and minor thirds. Artificial Neural Networks: Biological Inspirations - Icann 2005, Pt 1, Proceedings, 3696, 605-610.

Documents

CHAPTER 9: EXPLORING THE II-V-I CHORD · PDF fileCHAPTER 9: EXPLORING THE II-V-I CHORD ... discussing chord progressions in the context of establishing tonality. ... In jazz a chord