USING STRANGE ATTRACTORS TO MODEL SOUNDjonathanmackenzie.net/portfolio/PHD.pdf · The second approach focuses on modelling the dynamics of a sound via the embedded reconstruction

1

USING STRANGE ATTRACTORS

TO MODEL SOUND

Submitted toThe University of London

for the Degree ofDoctor of Philosophy

Jonathan Mackenzie

King's College

April 1994

2

Abstract

This thesis investigates the possibility of applying nonlinear dynamical systems

theory to the problem of modelling sound with a computer. The particular interest is in

the creative use of sound, where its representation, generation and manipulation are

important issues. A specific application, for example, is the modelling of

environmental sound for film sound-tracks.

Recently, there have been a number of major advances in the field of nonlinear

dynamical systems which include chaos theory and fractal geometry. It is argued that

these provide a rich source of ideas and techniques relevant to the issues of modelling

sound. One such idea is that complex behaviour may be generated from simple

systems. Such behaviour can often replicate a wide range of natural phenomena, or is

of interest in its own right because of its aesthetic appeal. This has been demonstrated

often through computer generated images and so an equivalent is sought in the audio

domain. This work is believed to be the first substantial attempt at this.

The investigation begins with a consideration of fractal and chaotic properties of

sound and with a comparison between established approaches to modelling and the

alternatives suggested by the new theory. Then, the inquiry concentrates on strange

attractors, which are the mathematical objects central to chaos theory, and on two

ways in which they may be used to model sound.

The first of these involves using static fractal functions to represent sound time

series. A technique is developed for synthesising complex abstract sounds from a

small number of parameters. A class of these sounds have the novel property that they

are simultaneously rhythms and timbres. It is believed these have potential for use in

computer music composition. Also considered is the problem of modelling a given

time series with a fractal function. An algorithm for doing this is taken from the

literature, shown to be of limited ability, and then improved. The results indicate that

data compression may be achieved for certain types of sound.

The second approach focuses on modelling the dynamics of a sound via the

embedded reconstruction of an attractor from a time series. Two models are presented,

one deterministic, the other stochastic. It is demonstrated that with the first of these,

certain sounds may be modelled such that their perceived qualities are preserved. For

some other signals, although the sound is not so well preserved, many statistical

aspects are. The second model is shown to provide a solution to the film sound-track

problem.

It is concluded that this investigation shows strange attractors to have considerable

potential as a basis for modelling sound and that there are many areas for continued

research.

3

To

Valerie Duff

4

AcknowledgementsI would very much like to thank my supervisor, Dr. Mark Sandler, for encouraging

me to begin this research project, for finding the funding for it, and for everything he

has done towards making it such a stimulating and enjoyable experience. I am also

indebted to Solid State Logic for providing the sponsorship and to Chris Jenkins for

arranging it. I doubt whether I would have had the opportunity to pursue the project of

my choice otherwise.

I am enormously grateful to my colleagues at King's College who have always

been helpful, supportive and inspiring. These include Maaruf Ali, Julian Bean, Victor

Bocharov, Rob Bowman, Ian Clark, Chris Dunn, Jason Goldberg, Anthony Hare, Rod

Hiorns, Simon Kershaw, Panos Kudumakis, Anthony Macgrath, Phillipa Parmiter,

Allan Paul, Marc Price, Mark Townsend, Mike Waters, and Jie Yu.

For sharing their knowledge and for always being helpful I would like to thank

Dr. Bill Chambers, Prof. Tony Davies, and Dr. Luke Hodgkin. I am also deeply

grateful to Peter King, Mustaq Mohammed and Talat Malik for their generous

technical support.

Finally, special thanks to Val, my family and friends for their support, enthusiasm,

patience and inspiration and for knowing never to ask "when are you going to finish?"

5

Contents

Abstract ............................................................................................................... 2

Acknowledgements ....................................................................................................... 4

Contents ............................................................................................................... 5

List of Figures ............................................................................................................... 8

List of Tables ............................................................................................................. 14

List of Sound Examples .............................................................................................. 16

List of Acronyms......................................................................................................... 19

1. Introduction ........................................................................................20

2. Modelling Sound.................................................................................242.1. Sound and its Representation ................................................................... 24

2.2. Music composition. .................................................................................. 25

2.3. The Roomtone Problem ........................................................................... 26

2.4. Digital Audio............................................................................................ 27

2.5. The Modelling Framework....................................................................... 28

2.6. Conventional Models ............................................................................... 29

2.6.1. Physical Modelling.................................................................... 29

2.6.2. Additive and Subtractive Synthesis........................................... 29

2.6.3. Frequency Modulation and Waveshaping................................. 32

2.7. Summary .................................................................................................. 33

3. Chaos Theory and Fractal Geometry ..............................................343.1. Introduction .............................................................................................. 34

3.2. The Significance of Chaos ....................................................................... 35

3.3. Dynamical Systems and State Space........................................................ 36

3.4. Stability .................................................................................................... 37

3.5. Attractors.................................................................................................. 39

3.6. Chaos........................................................................................................ 40

3.7. Visualisation............................................................................................. 42

3.8. Bifurcation................................................................................................ 44

3.9. Statistical Descriptions of Dynamics ....................................................... 47

3.10. Fractal Geometry.................................................................................... 48

3.11. Iterated Function Systems ...................................................................... 53

3.11.1. Contraction Mappings............................................................. 54

6

3.11.2. The Random Iteration Algorithm............................................ 56

3.11.3. The Shift Dynamical System................................................... 58

3.11.4. The Collage Theorem.............................................................. 59

3.11.5. The Continuous Dependence of the Attractor on the IFS

Parameters .............................................................................. 60

3.12. Summary ................................................................................................ 60

4. Applying Chaos and Fractals to the Problem of Modelling Sound.............................................................62

4.1. The Reasons for Using Chaos Theory................................................. 62

4.2. Diagnosis of Chaotic Behaviour ......................................................... 64

4.2.1. Chaos and Woodwind Instruments ........................................... 65

4.2.2. Chaos and Gongs....................................................................... 66

4.2.3. Fractal Time Waveforms........................................................... 66

4.2.4. 1/f Noise .................................................................................... 67

4.3. Representing Sound Using Chaos and Fractals................................... 71

4.4. Summary ............................................................................................. 73

5. Fractal Interpolation Functions........................................................755.1. Theory ................................................................................................. 75

5.2. The Synthesis Algorithm..................................................................... 78

5.3. Experiments with the Synthesis Algorithm......................................... 80

5.4. Rhythm/Timbres ................................................................................. 85

5.5. Generating Time-Varying FIF Sounds................................................ 87

5.6. A Genetic Parameter Control Interface ............................................... 90

5.6.1. Implementation ......................................................................... 91

5.6.2. Experiments .............................................................................. 95

5.7. Conclusions....................................................................................... 101

6. Modelling Sound with FIFs.............................................................1036.1. Deriving Interpolation Points from Naturally Occurring Sound ........... Waveforms 103

6.2. Mazel's Time Series Models ............................................................. 107

6.3. Comparison with Requantisation ...................................................... 109

6.4. Mazel's Inverse Algorithm for the Self-Affine Model ...................... 114

6.4.1. Initial Results .......................................................................... 118

6.4.2. Error Weighting ...................................................................... 121

6.4.3. Interpolation Point Range Restriction ..................................... 124

6.5. Conclusions....................................................................................... 128

7

7. Chaotic Predictive Modelling..........................................................1317.1. Chaotic Time Series .......................................................................... 131

7.2. Embedding ........................................................................................ 133

7.3. The Analysis/Synthesis Model.......................................................... 135

7.4. The Inverse Problem ......................................................................... 138

7.5. A Solution to the Inverse Problem ................................................... 140

7.6. Experimental Technique ................................................................... 143

7.7. Experiments with a Lorenz Time Series ........................................... 148

7.8. Experiments with Sound Time Series ............................................... 155

7.8.1. Air Noises................................................................................ 155

7.8.2. Gong Sounds ........................................................................... 162

7.8.3. Musical Tones ......................................................................... 164

7.9. Conclusions....................................................................................... 167

7.10. Further Work..................................................................................... 172

7.10.1. Using the Same Model with More Sounds ........................... 172

7.10.2. Optimising the Synthetic Mapping ....................................... 173

7.10.3. Stability Analysis .................................................................. 174

7.10.4. Connections with IFS............................................................ 174

7.10.5. Time Varying Sounds............................................................ 177

8. The Poetry Generation Algorithm..................................................1788.1. Introduction ....................................................................................... 178

8.2. Description of the Algorithm ............................................................ 179

8.3. Analysis of the PGA.......................................................................... 184

8.4. Implementation of the PGA for Sound ............................................. 187

8.5. Results ............................................................................................... 191

8.6. Conclusions....................................................................................... 197

9. Summary and Conclusions..............................................................200

Appendix A. Previously Published Work ..........................................209AES Preprint ................................................................................................. 210

ISCAS '94...................................................................................................... 221

References ............................................................................................225

8

List of Figures

Figure 1.1 A synthetic cloud, fern and a Julia set [frac90]. ........................................ 20

Figure Error! Bookmark not defined..1 The analysis-synthesis scheme................. 25

Figure Error! Bookmark not defined..2 The sound modelling framework. ............ 28

Figure Error! Bookmark not defined..3 A schematic diagram for additive synthesis.

..................................................................................................................................... 30

Figure Error! Bookmark not defined..4 Karplus-Strong algorithm. Top, simplified

recursive linear filter and bottom, general delay-line view......................................... 31

Figure Error! Bookmark not defined..5 The basic units used within the FM (top)

and waveshaping (bottom) synthesis techniques......................................................... 32

Figure Error! Bookmark not defined..6 State space representation of a dynamical

system.......................................................................................................................... 37

Figure Error! Bookmark not defined..7 Illustration of the three regular attractor

types. ........................................................................................................................... 40

Figure Error! Bookmark not defined..8 Sequence of magnifications of the Lorenz

attractor showing its fractal, self-similar property. ..................................................... 42

Figure Error! Bookmark not defined..9 Two simulations of the Lorenz system for

similar initial conditions showing sensitive dependence on initial conditions. .......... 42

Figure Error! Bookmark not defined..10 Three phase portraits constructed from a

time series of observations of the Lorenz chaotic system. Delay values are: (a) 1, (b)

10, (c) 100. .................................................................................................................. 43

Figure Error! Bookmark not defined..11 The logistic mapping for 0 9. . .......... 45

Figure Error! Bookmark not defined..12 Bifurcation diagram for the logistic

mapping with corresponding time series plots............................................................ 46

Figure Error! Bookmark not defined..13 The exactly self-similar, triadic Koch

curve............................................................................................................................ 49

Figure Error! Bookmark not defined..14 General formula for similarity dimension

derived by inspection of standard Euclidean shapes. ................................................. 50

Figure Error! Bookmark not defined..15 Iterative construction of the triadic Koch

curve............................................................................................................................ 52

9

Figure Error! Bookmark not defined..16 Area of closed Koch curve (dark grey) is

within area of circle (light grey) showing that it is finite. ........................................... 52

Figure Error! Bookmark not defined..17 Three affine contraction mappings on

X=R 2 and their single combination, W. ..................................................................... 55

Figure Error! Bookmark not defined..18 The repeated application of a contractive

mapping, W, to some initial set B, tending to the limit set, or attractor, A.................. 55

Figure Error! Bookmark not defined..19 Example of Random Itaration Algorithm

(RIA) in operation. The three images show the results of iterating the Markov process,

(a)~100, (b)~300, (c)~1000 times. .............................................................................. 57

Figure Error! Bookmark not defined..20 Examples of RIA attractors where the

mappings are weighted with different associated probabilities................................... 58

Figure Error! Bookmark not defined..21 Example of an IFS attractor partitioned

into three disjoint subsets according to the effect of the three individual contraction

mappings on the attractor. ........................................................................................... 59

Figure Error! Bookmark not defined..22 Bifurcation diagram showing a Hopf

bifurcation occurring at the threshold of oscillation in a wind instrument as the

blowing pressure is increased...................................................................................... 65

Figure Error! Bookmark not defined..23 Time series plots and spectral density

forms for 1/f noise compared with white noise and Brown noise............................... 69

Figure Error! Bookmark not defined..24 Power spectral densities of wind noise

(left) and an industrial roomtone (right) showing 1/f characteristic over the audible

range of frequencies. ................................................................................................... 70

Figure Error! Bookmark not defined..25 A demonstration of the property of

continuous dependence of IFS attractors on the parameters that define them. This also

illustrates the power of manipulation capable with chaotic models [frac90].............. 73

Figure Error! Bookmark not defined..26 An example of the effect of three shear

maps, w w w1 2 3, and on the area A and an illustration of one of the vertical scaling

factor, d1...................................................................................................................... 77

Figure Error! Bookmark not defined..27 The initial arbitrary set, B, and a sequence

of five iterations of the deterministic algorithm. ........................................................ 81

Figure Error! Bookmark not defined..28 FIF for equally spaced interpolation points

derived from a single cycle of a sinewave, but where the vertical scaling factors

increase for the mappings from left to right. ............................................................... 82

10

Figure Error! Bookmark not defined..29 FIF where x values are spaced according to

a square law. Sequence of magnifications of windows is shown in (a)-(d). ............... 83

Figure Error! Bookmark not defined..30 Same interpolation points as Figure Error!

Bookmark not defined..29, but with 6 iterations showing the cumulative effect of

errors in the algorithm. The bottom plot is a magnification of the middle ~1000 points

of the top plot. ............................................................................................................. 84

Figure Error! Bookmark not defined..31 FIF generated from random x,y and d

values for the interpolation points............................................................................... 84

Figure Error! Bookmark not defined..32 (a) (left) FIF generated with random y

values, but evenly spaced x. All d = 0.9. (b) (right) FIF generated with random y, but

square law x values. All d = 0.9. ................................................................................. 85

Figure Error! Bookmark not defined..33 - see Table Error! Bookmark not

defined..1 .................................................................................................................... 86

Figure Error! Bookmark not defined..34 Development of two rhythm/timbres from

rhythmic design, top, through interpolation points, middle, to final waveform, bottom.

..................................................................................................................................... 87

Figure Error! Bookmark not defined..35 Control rule for time-varying FIF sound.Left, pseudocode where j

ij

i yx , is the ith interpolation point of the jth FIF and dij is

the vertical scaling factor for the ith map of the jth FIF. Right, graphical depiction of

the effect on the interpolation points through time. .................................................... 88

Figure Error! Bookmark not defined..36 Left, time plot of the whole waveform

generated with the control rule shown in Figure Error! Bookmark not defined..35

with selected magnifications of individual FIFs to show how the sound develops

through time. Right, spectrogram of the first half of the sound showing how it

contains complex, time varying partials similar to those found in naturally occurring

musical sounds. ........................................................................................................... 89

Figure Error! Bookmark not defined..37 Pictorial representation of the FIF

parameter control used to generate the second example of a time-varying FIF sound.90

Figure Error! Bookmark not defined..38 Schematic diagram of the model for

biological evolution..................................................................................................... 92

Figure Error! Bookmark not defined..39 Schematic diagram of hardware used for

GEN program. ............................................................................................................. 92

Figure Error! Bookmark not defined..40 Example of mutation, (a), and

recombination, (b), of FIF parameters......................................................................... 94

11

Figure Error! Bookmark not defined..41 A single screen-shot from the program

GEN............................................................................................................................. 96

Figure Error! Bookmark not defined..42 A sequence of populations generated with

the program GEN. In this case, the FIFs are produced from 6 interpolation points. At

the start (waveform A - top left) all interpolation points and vertical scaling factors

are zeroed. At each stage, 7 mutations are produced and then a single survivor is

chosen by the operator (starred waveform), which reappears as waveform A in the

next generation. ........................................................................................................... 98

Figure Error! Bookmark not defined..43 Starting point (top left) and sequence of

starred waveforms from Figure Error! Bookmark not defined..42 shown in more

detail. ........................................................................................................................... 99

Figure Error! Bookmark not defined..44 Mutated varients of an FIF that is defined

by a relatively large number of parameters. It can be seen (and heard) that when this is

the case, low factor mutations are found not to be distinctive from one another...... 100

Figure Error! Bookmark not defined..45 Results of an experiment to extract

interpolation points by decimating a wind sound waveform and then constructing an

FIF with them. ........................................................................................................... 103

Figure Error! Bookmark not defined..46 Original wind sound waveform (top),

interpolation of peak points (bottom left), and reconstructed waveform (bottom right).

................................................................................................................................... 105

Figure Error! Bookmark not defined..47 Section of original wind sound (left) and

part of the composite FIF (right) constructed using groups of peak points............... 106

Figure Error! Bookmark not defined..48 Mapping of amplitudes in requantisation

process....................................................................................................................... 110

Figure Error! Bookmark not defined..49 Degradation against compression

performance of Mazel's inverse algorithms for a variety of data and model types

compared with the theoretically expected performance of requantisation................ 113

Figure Error! Bookmark not defined..50 First trial pair of interpolation points on

the original time series graph. ................................................................................... 115

Figure Error! Bookmark not defined..51 Mapping of whole time series to in

between the first pair of interpolation points. ........................................................... 115

Figure Error! Bookmark not defined..52 Maximum vertical extent of part of the

original time series between a pair of consecutive interpolation points and the

maximum vertical extent of the mapped original time series. The vertical scaling

factor is calculated so as to make these two extents equal........................................ 117

12

Figure Error! Bookmark not defined..Error! Bookmark not defined. Error

weighting function parameterised by ..................................................................... 122

Figure Error! Bookmark not defined..53 Graph of the results shown in Table

Error! Bookmark not defined..9. ........................................................................... 123

Figure Error! Bookmark not defined..54 Comparison of performance between

requantisation and error-weighted version of Mazel's algorithm. The original is 1000

samples of wind noise which is processed as 10x100 sample sections. ................... 124

Figure Error! Bookmark not defined..12 Comparison of performance of the window

restricted inverse algorithm with that of requantisation. The original time series is

wind noise and processed as 10x100 sample sections. ............................................. 126

Figure Error! Bookmark not defined..13 Waveform plot of original wind noise

(left) and compressed FIF version (right) using the modified inverse algorithm. The

compression ratio in this case is 8.1:1, and the SNR is 22.6dB................................ 127

Figure Error! Bookmark not defined..14 Column chart showing the performance

figures given in Table Error! Bookmark not defined..11 for a variety of different

original sound time series. ........................................................................................ 128

Figure Error! Bookmark not defined..55 The proposed analysis/synthesis model

based upon the embedded attractor and measure representation of a sound time series.

................................................................................................................................... 136

Figure Error! Bookmark not defined..56 Left, an example recursive partition for

m=2 and right, the associated search tree. ................................................................. 142

Figure Error! Bookmark not defined..57 Lorenz input, N=10,000, Q=256 and a

variety of embedding dimensions, m.................................................................. 149

Figure Error! Bookmark not defined..58 Lorenz input, N=10,000, m=7, and a

variety of number of domains, Q. ............................................................................. 150

Figure Error! Bookmark not defined..59 Lorenz input, Q=64, m=7 and a variety of

original time series lengths, N ....................................................................... 151

Figure Error! Bookmark not defined..60 Time series plots from original Lorenz

system (left) and the synthetic one shown as phase portrait Error! Bookmark not

defined..58(f) (right). ................................................................................................ 152

Figure Error! Bookmark not defined..61 Estimates of amplitude probability

distributions for original, left, and synthetic, right, time series shown in Figure Error!

Bookmark not defined..60. ..................................................................................... 153

13

Figure Error! Bookmark not defined..62 Time series plots and phase portraits for:

left, original fan rumble sound and right, best synthetic output, rc127..................... 157

Figure Error! Bookmark not defined..63 Time series plots and phase portraits for

some more outputs from the sound model using the fan rumble as input. Note that

only about a third the length of the output appears in the phase portraits as it does in

the time series plots for the sake of clarity................................................................ 159

Figure Error! Bookmark not defined..64 Time series plots (first fifth of top plot

shown magnified as second plot), power spectra and phase portraits for original wind

noise, left, and synthetic version, right...................................................................... 161

Figure Error! Bookmark not defined..65 Time series plots, phase portraits and

amplitude histograms for original, left, and synthetic, right, lightly-struck gong sound.

Both amplitude histograms were computed with 10,000 samples and 100 bins....... 163

Figure Error! Bookmark not defined..66 Time series plots, phase portraits and

amplitude histograms for original, left, and synthetic, right, hard-strike gong sound.

Both amplitude histograms were computed with 10,000 samples and 100 bins....... 164

Figure Error! Bookmark not defined..67 Time series plots, power spectra and phase

portraits for original, left and synthetic, right, tuba tones. ........................................ 166

Figure Error! Bookmark not defined..68 Time series and phase portraits for

original, left, and synthetic, right, saxaphone tones. ................................................. 166

Figure Error! Bookmark not defined..69 Relative one-step prediction errors for the

best results found for each of the time series. .......................................................... 168

Figure Error! Bookmark not defined..70 Autocorrelation functions for original, left,

and synthetic, right, gently struck gong sound. The upper plot shows the function upto

8,000 delays, and the lower upto 100 delays. Both were calculated by convolving

10,000 samples of the time series with itself for different delays............................. 171

Figure Error! Bookmark not defined..71 The top line shows the interdependence of

the components of the RIA version of an IFS. The bottom line shows a suggested path

to obtain a solution to the inverse problem. .............................................................. 179

Figure Error! Bookmark not defined..72. Input to the algorithm treated as a circular

sequence. ................................................................................................................... 181

Figure Error! Bookmark not defined..73 Part of the state space, X, corresponding to

an example PGA showing some of the possible states and their associated transitions.

................................................................................................................................... 185

14

Figure Error! Bookmark not defined..74 Crossfade envelopes applied to beginning

and end of original time series which are then added together to form modified time

series. This is then stored in the circular register so that there is no amplitude

discontinuity between its end and its beginning........................................................ 191

Figure Error! Bookmark not defined..75 Time domain plots of the original

roomtone showing 300 (left) and 3000 (right) samples. ........................................... 194

Figure Error! Bookmark not defined..76 Time domain plots of output time series

when (a) I=300, L=1, (b) I=3000, L=3, and (c) I=300, L=4...................................... 194

Figure Error! Bookmark not defined..77 Comparison between original (left) and

synthetic time series (right) showing: (a)&(b) time domain plots, (c)&(d) power

spectral densities calculated by averaging eleven 4096 point FFTs, and (e)&(f)

amplitude histograms calculated from 30,000 samples. ........................................... 195

15

List of Tables

Table Error! Bookmark not defined..2 A summary of possible sound types. After

[ross82]........................................................................................................................ 25

Table Error! Bookmark not defined..3 (left) example set of interpolation points and

vertical scaling factors that define the FIF shown in Figure Error! Bookmark not

defined..27. ................................................................................................................ 80

Table Error! Bookmark not defined..4 (right) vertical scaling factors used in

generating Figure Error! Bookmark not defined..28............................................... 80

Table Error! Bookmark not defined..5 and Figure Error! Bookmark not

defined..78 Input data and waveform plot of the resulting FIF that is a rhythm/timbre.

..................................................................................................................................... 86

Table Error! Bookmark not defined..6 Summary of the results obtained by Mazel

for his four FIF based models/inverse algorithms..................................................... 109

Table Error! Bookmark not defined..7 Summary of results for reimplementation of

Mazel's algorithm for the self-affine model. Each original time series of length Ttot has

been processed as m=10 sections of length T=100. .................................................. 119

Table Error! Bookmark not defined..8 Running algorithm with wind noise as

original time series for a variety of section lengths T. .............................................. 120

Table Error! Bookmark not defined..9 Results of error weighting the inverse

algorithm for a range of weighting function gradients, . The original time series is

wind noise and is processed as 10x 100 sample sections.......................................... 122

Table Error! Bookmark not defined..10 Performance of modified FIF inverse

algorithm with a specified window restricting the range of the trial interpolation point.

................................................................................................................................... 126

Table Error! Bookmark not defined..11 Table of performance figures for window

restricted inverse algorithm using a variety of sound time series. Each original time

series is processed as 10x100 sample sections and the restriction window is set at l=15

and r=25 samples...................................................................................................... 127

Table Error! Bookmark not defined..12 Summary of results using fan rumble sound

as input to the dynamic model............................................................................... 156

Table Error! Bookmark not defined..13 Summary of analysis parameters for best

results using gong sounds.......................................................................................... 162

Table Error! Bookmark not defined..14 Analysis details for the musical tones. .. 165

16

Table Error! Bookmark not defined..15 Example of the PGA acting on a short

paragraph of text for a variety of values of the seed length parameter. .................... 180

Table Error! Bookmark not defined..16 Example sequence of iterations of the PGA.

................................................................................................................................... 182

Table Error! Bookmark not defined..17 Simple example showing how the

preprocessing reorders the original input sequence. ................................................. 189

Table Error! Bookmark not defined..18 Summary of results obtained with PGA and

industrial roomtone as original time series. (Numbers in brackets are experiment

identification.) ........................................................................................................... 192

Table Error! Bookmark not defined..19 Summary of results for PGA used with

other roomtones having different qualities................................................................ 196

Table Error! Bookmark not defined..20 Summary of results obtained with PGA and

a variety of other background sounds........................................................................ 197

17

List of Sound Examples

All sounds are created by playing 16-bit sound files at 48kHz or 44.1kHz sample-

rate unless otherwise stated. The sample-rate is indicated by the suffix of the sound

file name given in brackets after each description. For example, '.441' indicates an

original sound recording made with a sample-rate of 44.1kHz or a synthetic version

played-back at that rate. The suffix '.mbi' is used to indicates an abstract waveform

with no intrinsic sample-rate. These files are played at 48kHz.

Playback is via a Digital Audio Labs 'CardD Plus' system connected to an IBM

compatible P.C. This allows an AES/EBU compatible, serial digital audio data-stream

to be generated from the sound file. This is then passed to a Sony TCD-D10 digital

audio tape (DAT) recorder which is used as the digital-to-analogue device.

Chapter 51. FIF derived from 17 equally x-spaced interpolation points taken from a single

sinewave cycle, 5 iterations. (sine_5.mbi) .................................................................. 81

2. Same as Sound 1, but with increasing vertical scaling factors. (sine3.mbi) ......... 81

3. FIF derived from 129, square-law x-spaced interpolation points taken from a single

sinewave cycle, 3 iterations. (sine9_3.mbi) ................................................................ 82

4. Same waveform used in Sound 3, but played as a sequence where the speed of

playback is halved at each stage. (sine9_3.mbi) ......................................................... 83

5. FIF derived from randomised interpolation points and vertical scaling factors.

(rand4.mbi).................................................................................................................. 84

6. FIF derived from interpolation points whose y-values are randomised, but that are

regularly x-spaced. (rand2.mbi)................................................................................... 84

7. Same as Sound 6, but with square-law x-spacing. (rand3.mbi) .............................. 84

8. Original FIF rhythm/timbre. (fif1.mbi) ................................................................... 85

9. Same waveform used in Sound 8, but played as a sequence where the speed of

playback is halved at each stage. (fif1.mbi) ............................................................... 85

10. First designed FIF rhythm/timbre. (rhy2_1_x.mbi) .............................................. 86

11. Second designed FIF rhythm/timbre. (rhy4_4.mbi) .............................................. 86

12. Percussive sounding, time-varying FIF. (tv1.mbi)................................................ 89

18

13. Second example of a time-varying FIF. (tv2.mbi) ................................................ 89

14. Audio output from the program GEN which accompanies Figure Error!

Bookmark not defined..79. Each of the 8 sounds is a member of a single evolved

population of FIFs. Played at 48kHz........................................................................... 95

15. Sounds to accompany Figure 5.17. Each of the 8 sounds is the chosen survivor of

a sequence of generations produced with GEN. Played at 48kHz. ............................. 96

16. Concatenated sequence of ~15 short, evolved FIFs. (mbi1log.mbi)..................... 97

17. Concatenated sequence of 4 related FIF rhythm/timbres. (goodone.mbi) ............ 97

18. Audio output from GEN which accompanies Figure 5.19. Each sound is the

member of one generation evolved from FIF parameters similar to those used in

Sound 3. It can be heard how there is little to distinguish the mutated offspring. Played

at 48kHz. ................................................................................................................... 100

Chapter 619. FIF whose interpolation points are the peak-points of a wind noise waveform.

(wp1.mbi) .................................................................................................................. 105

20. As Sound 19, but using groups of peak-points. (wp2.mbi)................................. 106

Chapter 7All the examples from Chapter 7 are presented as pairs of the original sound and

the synthetic version produced with the chaotic predictive model.

21. Original fan rumble air-noise. (fan_rmb5.48)..................................................... 157

22. Synthetic version of above. (rc127b.48) ............................................................. 157

23. Original wind noise. (wind6.48) ......................................................................... 160


25. Original lightly-struck gong sound. (gong4.48) .................................................. 162


27. Original hard-strike gong sound. (gong6.48) ...................................................... 162


29. Original tuba tone. (tuba2.48) ............................................................................. 165

30. Synthetic version of above. (rc175x.48) ............................................................. 165

19

31. Original saxophone tone. (sax9.48) .................................................................... 165

32. Synthetic version of above. (rc108x.48) ............................................................. 165

Chapter 833. Original industrial roomtone. (rmt4.441)............................................................ 192

34. Synthetic version of Sound 33 produced by PGA where I=300 and L=1.

(rmt4_111.441).......................................................................................................... 192

35. As 34, but I=3,000 and L=2. (rmt4_212.441) ..................................................... 192

36. As 34, but I=3,000 and L=3. (rmt4_213.441) ..................................................... 192

37. As 34, but I=3,000 and L=4. (rmt4_214.441) ..................................................... 192

38. As 34, but I=30,000 and L=2. (rmt4_312.441) ................................................... 192

39. As 34, but I=3,000 and L=5. (rmt4_315.441) ..................................................... 192

40. Original laboratory roomtone, played at 48kHz. (lab_rmt.48)............................ 196

41. Synthetic version of above produced with PGA, played at 48kHz.

(lab_313.48) .............................................................................................................. 196

42. Original 'rumble-like' industrial roomtone. (rmt11.441)..................................... 196

43. Synthetic version of above produced with PGA. (rt11_314.441) ....................... 196

44. Original industrial roomtone with drone. (rmt15.441)........................................ 196

45. Synthetic version of above produced with PGA. (rt15_314.441) ....................... 196

46. Original river sound. (river.48) ........................................................................... 197

47. Synthetic version of above produced with PGA. (rive_313.48) ......................... 197

48. Original wind noise. (wind1.48) ......................................................................... 197

49. Synthetic version of above produced with PGA. (wind_313.48)........................ 197

50. Original audience applause sound. (applause.48) ............................................... 197

51. Synthetic version of above produced with PGA. (appl_312.48)......................... 197

52. Original rainforest ambience. (ecuador.48)......................................................... 197

53. Synthetic version of above produced with PGA. (ecua_314.48) ........................ 197

54. Original speech extract. (speech.48) ................................................................... 199

55. Synthetic version of above produced with PGA. (sp_pga.48) ............................ 199

20

21

Summary of Acronyms

DAT Digital Audio Tape

DSP Digital Signal Processor

FFT Fast Fourier Transform

FIF Fractal Interpolation Function

FM Frequency Modulation

IFS Iterated Function System

jpdf joint probability density function

LPC Linear Predictive Coding

pdf probability density function

PGA Poetry Generation Algorithm

RIA Random Iteration Algorithm

rms root mean square

SDS Shift Dynamical System

SNR signal to noise ratio

20

Chapter 1

IntroductionThis thesis is about applying science and technology to the arts. In particular, the

science is that of chaos theory, which includes fractal geometry, the technology is the

computer, and the medium of interest, sound. Fractals and chaos are recent

developments which are revolutionising our understanding of the complex and

irregular nature of the world. Chaos theory is concerned specifically with the

behaviour of nonlinear dynamical systems. It is about the realisation that simple,

deterministic systems can exhibit complex, unpredictable behaviour. Fractal geometry

deals with a class of forms that are not accounted for by conventional, Euclidean

geometry. The two overlap with the concept of a strange attractor which both

embodies the nature of chaotic systems and is itself a fractal object. The relevance and

use of chaos and fractals is currently spreading through a diverse range of subjects. A

number of developing areas of interest are characterised by the overlap of both

scientific and artistic concerns. In particular, two subjects have emerged that have

considerable popularity: visual art and music. Both combine fractal and chaotic

models with computer technology to provide powerful tools for artistic

experimentation. The aim of this work is to seek a parallel to this, but involving

sound.

Consider the images shown in Figure Error! Bookmark not defined..1. These

are examples of the power of fractals and chaos. Using only very simple models it is

possible to create images that can be either complex abstract forms or realistic replicas

of natural objects. The question is, can the same be found in the acoustic domain? For

example, could a complex, naturally occurring sound be represented with a simple

model? Does there exist an aural equivalent of the Julia set?

Figure 1.1 A synthetic cloud, fern and a Julia set [frac90].

21

Interest in fractal music has concentrated on the arrangement of sequences of notes

with reference to fractal or chaotic models. Although the end product is audio, the

actual sounds used are conventional natural or synthetic ones (for example see

[pres88, gogi91 and jone90] ). The time scale on which fractals and chaos are being

used for music, then, is different to that of the sounds themselves. Musical

fluctuations range from thousandths of Hertz up to several Hertz. Audio fluctuations,

however, range from hundreds of Hertz to tens of thousands. An important discovery

that supports the use of fractals and chaos for music composition is that, when

analysed, music from a wide range of cultures and historical periods is found to have

fractal properties [voss78, hsu90 and hsu91]. It has been suggested, however, by

Benoit Mandelbrot, the inventor of the term fractal, that such properties should not

extend beyond the musical structure to the sounds themselves as these are governed

by different mechanisms [mand83].

But why should this necessarily be the case? What about the complex and

irregular side of musical sound, for example the hiss of a breathy saxophone, or the

crash of a cymbal? Also, what about non-musical sound? All around us there are

complex and irregular sounds generated by our environments: a burbling brook,

splashing water, the roaring of the wind, the rumble of thunder and the variety of

screeching, scraping, buzzing and humming noises made by machinery. Is it, perhaps,

that these sounds represent an aural equivalent to the shapes found in nature that have

been neglected by Euclidean geometry and then rediscovered as fractals? Criticising

the conventional Fourier approach to modelling musical sound, the contemporary

composer Iannis Xenakis has said:

"It is as though we wanted to express a sinuous mountain silhouette by portions of

circles." [xena71]

Compare this to what Mandelbrot says in the introduction to his 'The Fractal

Geometry of Nature':

"Clouds are not spheres, mountains are not cones, coastlines are not circles, and

bark is not smooth, nor does lightning travel in straight lines." [mand83]

This thesis, then, presents an exploratory study into the idea of using chaos theory

and fractal geometry to model sound. Apart from the interest in this as a research

topic, the work is practically motivated with the aim of developing computerised tools

that would allow control over complex and irregular sounds for creative uses. The

potential applications for such tools include computer music composition and the

generation of sound effects for film and television.

22

The overall design of this thesis is as follows: Chapters 2, 3 and 4 present the

background to this thesis and develop specific problems on which to work. Then

Chapters 5, 6, 7 and 8 present original contributions towards the solution of these

problems. Each of these chapters contains its own conclusions and a discussion of

further work where relevant. Chapter 9 contains a summary of the thesis and some

general conclusions. An appendix is included which contains copies of previously

published papers on this work and the thesis ends with a full list of references.

Throughout the thesis, references are made to sound examples which are presented on

an accompanying cassette tape. The sound examples are listed, along with all figures

and tables, after the contents pages. Also included is a summary of acronyms for

reference. The content of each chapter is previewed below.

Chapter 2 defines what is meant by a sound model. It considers what sound is, and

the general concept of its representation via the procedures of analysis and synthesis.

Some specific applications are described, including 'the roomtone problem', which

allows a functional description of a model to be developed. Brief reviews of some

well known models fitting this description are given including some of their

advantages and limitations.

Chapter 3 presents a review of chaos theory and fractal geometry. This includes an

outline of some main features and their significance. The emphasis is on

understanding how complex behaviour arises from simple systems, the importance of

strange attractors, and the introduction of Iterated Function Systems (IFS), which

provide a useful practical framework for manipulating strange attractors.

In Chapter 4 the issue of applying the ideas of chaos theory and fractal geometry

to the problem of modelling sound is considered. It is argued that both appear to have

potential use, but that two main questions are raised. Firstly, on a diagnostic level: are

sounds chaotic or fractal? Positive evidence is collected both from the literature and

from original work. The second question is then a practical one: in what way can

sound be represented with chaos or fractals? The conclusion is to concentrate on using

strange attractors in two different ways with an emphasis on involving IFS.

Chapter 5 is concerned with using IFS strange attractors to produce synthetic

sound by generating waveforms with Fractal Interpolation Functions (FIF), a class of

IFS. A basic technique is designed that is then advanced in several ways. The most

important result is the discovery of a new class of sounds that are simultaneously

rhythms and timbres. With these techniques complex sounds may be generated with

small amounts of data and are demonstrated to have potential for musical applications.

Chapter 6 keeps the theme of FIF, but considers the analysis and synthesis of a

given sound. An algorithm is taken from the literature which appears suitable for this

23

task. It is shown, however, to be inadequate, a reason found, and the algorithm

improved. Results indicate that some degree of data compression may be obtained for

certain sounds.

Chapter 7 is concerned with the problem of modelling the dynamics of a sound via

a strange attractor. The assumption is made that a chaotic system is responsible for a

digital audio time series. The system may then be reconstructed from the time series

with a technique known as embedding. Because of the properties preserved by

embedding, the construction of another chaotic system that approximates the

embedded one should produce a time series that is statistically similar to the original.

An approach to this problem is considered which combines techniques taken from

work on the nonlinear prediction of time series with an original method inspired by

the Shift Dynamical System (SDS) version of an IFS. An analysis/synthesis algorithm

is developed and a number of experiments performed. The algorithm is shown to be

capable of modelling known chaotic systems from their time series. Also, despite

some difficulties, the algorithm is capable of successfully reproducing some natural

sound so that it is perceptually similar to the original.

Chapter 8 is also concerned with the problem of modelling the dynamics of a

sound in an embedded state space setting. The model considered, however, is the

Random Iteration Algorithm (RIA) version of an IFS where a Markov chain is used to

model the embedded invariant measure. In the course of this investigation, an

algorithm is developed which solves the roomtone problem for certain ambient

sounds.

Chapter 9 presents a summary of the thesis and some general conclusions on the

subjects of inverse problems, algorithmic complexity and developments of the work.

24

Chapter 2

Modelling Sound

This chapter develops a working definition of a sound model. It will consider what

sound is and its representation within an analysis/synthesis framework. Some possible

applications of such a model will be discussed including a specific one concerning

film sound-track editing, known as 'the roomtone problem'. This leads to a set of

useful functions that define the model. Also, a brief review of established modelling

techniques, their advantages and limitations is included.

2.1. Sound and its RepresentationWhat is sound? It can be defined as either an auditory sensation perceived by the

mind, or as the physical disturbance that gives rise to such a sensation [ross82]. A

practical model for sound has, in some way, to represent it in an appropriate form.

Starting from this definition of sound there are a number of levels on which this

representation could take place. Consider these as ordered from the outside in: on the

outside level, a model could be made of the complete physical system that is

responsible for the sound. This might include the source of the disturbance and its

reverberant environment. A list of possible disturbances is shown in Table 2.1.

Secondly, this model may be simplified to include only that which is relevant to

describing the pressure fluctuations in the air at a single point; for example at the ear

or a microphone. Next, a model could be made for the time waveform created by

recording those pressure fluctuations at a single point without any, or little,

consideration of the physical system that created it. The waveform is then an abstract

pattern which is to be modelled. Finally, the model may account for just the

perception of the sound, so that an accurate representation of the time waveform is not

necessary, but a representation is needed that just contains the relevant information to

capture the essential characteristics of the sound.

At whatever level the representation is made, a useful framework within which to

test its validity is provided by the analysis-synthesis scheme shown in Figure 2.1

[riss82]. The important feature is that a listener judges how good the representation is

at capturing the characteristics of the sound. In order to refine this modelling

framework, it will be useful to consider some of the applications where sound models

are, or might be used.

25

synthesisanalysis

representation soundsound

listener

Figure 2.1 The analysis-synthesis scheme.

Physical Disturbance Example

vibrating solid bodies metal bar, speaker cone, violin body

vibrating air column pipe organ, woodwind instrument

flow noise in fluids due to

turbulence

jet engines, air leaking under

pressure, wind noise

interaction of

moving solid with fluid

or

moving fluid with solid

rotating propeller or fan blade

air flow in duct or through grill,

water in pipe, waves breaking on sea

shore

rapid changes in temperature or

pressure

thunder and other sounds caused by

electrical discharge, chemical explosion

shock waves caused by motion or

flow at supersonic speed

supersonic boom caused by jet

aircraft

Table 2.1 A summary of possible sound types. After [ross82].

2.2. Music composition.An important aspect of music composition is, obviously, the control over the type

and quality of sound used. This century has seen the use of electronic and, more

recently, computer based techniques grow from the experimental to the mainstream.

Typically, such techniques involve obtaining musical sound and processing it to

modify it, or generating it entirely synthetically. Of importance are the degrees of

26

musical usefulness and flexibility that are offered by a technique coupled with the

ease and efficiency with which it can be executed.

Imagine the example of a drum synthesiser. What might be its attractive features

for a composer? It might be able to take the recording of an original drum sound and

reproduce it so as to retain its relevant characteristics, discarding any perceptually

unimportant information in the process. It might then allow the sound to be modified

in a way related to its physical attributes, for example, to be able to change the sound

as if it came from a larger version of the same drum, or one that had a tighter skin and

has been struck with a different beater. Furthermore, the synthesiser might allow drum

sounds to be generated that it would not be possible to create with real instruments.

A more detailed discussion of sound modelling techniques used for music

composition is given in the forthcoming sections 2.5 - 2.8.

2.3. The Roomtone ProblemAnother area of creative sound use is film sound-track editing. This, as with music

composition, generally involves manipulating sound in a number of ways except that

often the sound is non-musical. A good example of this is the use of sound effects.

Here, the desire is to add certain sounds to a film to enhance or complement what is

taking place visually. Traditionally, this is done by simulating the appropriate sounds

with a variety of acoustic devices or making use of large reference libraries of

recordings. It is, however, often problematic and time consuming to get exactly the

desired sound. A specific example of this is the roomtone problem which was posed

by the company that sponsored this research.

The roomtone problem arises during post-production editing of a film sound-track.

Often, due to problems that have occurred with the location filming, it is necessary to

replace sections of the original sound-track at a later date. For example, this can

involve having them dubbed by the original actors in an acoustically dry sound studio.

The problem occurs when the new pieces of sound track are inserted into the original

as there is often a noticeable lack of background sound. As these background sounds

tend to be characteristics of internal locations, they are known as roomtones. One

traditional solution to this problem involves referring to libraries of roomtone

recordings to find a matching sound. It is often difficult, however, to find exactly the

right sound and the process can also be time consuming. Another solution is to make

use of small snippets of the roomtone found in places on the original recording, for

example between lines of dialogue. These may be spliced together, or looped to form

as long a piece as is necessary. As with the other solution, this can be an intricate and

27

time consuming process, the results often not good enough because the splices and

loops are audible.

An ideal solution to this problem, then, would be some form of sound model that

is able to capture certain essential characteristics of the roomtone from a small

original sample and then produce greater quantities of a synthetic version.

Both the examples of the drum synthesiser and the roomtone problem illustrate a

certain type of creative application for sound models. Generally, the need is for the

model to capture essential characteristics of the sound; for it to allow useful

manipulation of the sound; and/or for it to generate synthetic sound. An important

aspect of such models is that the representation involves a set of parameters. These are

the variables of the model that, with the particular representation, form all the

information extracted by the analysis, and/or used by the synthesis. So for the drum

model, the parameters might include the physical attributes of the drum, or for the

roomtone model, the extract of original sound.

2.4. Digital AudioBeing more specific about the sound model, it is assumed that it will operate

within a computer and therefore rely on digital audio as an intermediate

representation. This brings the enormous advantage that the modelling process may be

implemented as a computer program, which makes it highly flexible, and convenient

to develop [math82]. Digital audio satisfies the definition of a representation for

sound that has been given already. It is a discrete time, discrete amplitude model for

the time waveform generated from recording sound at a single point in space. It

preserves perceived information in the form of all frequencies contained within the

sound up to one half of the sampling frequency. This is guaranteed by Nyquist's

sampling theorem [nyqu28]. It is, however, unwieldy, in that a large amount of data is

required for good quality representation. For example, the industry standard of a

48kHz sampling rate and 16 bits per sample [aes85] means that approximately one

million bytes of data are required to represent ten seconds of sound; this data not

being in a form that is obviously related to the perceived characteristics of the sound.

This is therefore another reason for further representation of the sound waveform: so

as to reduce the amount of parameter data. Assuming the use of digital audio and

therefore computers also means that the model has to perform its desired functions

within the constraints imposed by the processing ability of the computing devices

used.

28

2.5. The Modelling FrameworkFollowing the discussion developed within this chapter, then, a working functional

description of a sound model is summarised as follows. A sound model is of use if:

1) it can represent the essential perceived characteristics of the sound;

2) there is less parameter data than there is original sound data;

3) the parameter data is of a form such that its manipulation has a useful or

interesting effect on the sound;

4) it can generate new sounds, or replicas of naturally occurring ones, from a little

data and/or a simple model.

Although much is known for particular situations, it is very difficult to say, in

general, what physical attributes of the sound it is sufficient to preserve in the

representation so as to satisfy 1). This is still an open question in psychoacoustics [see

deut82]. Point 2) on its own may also be described as data compression. Although this

tends to be an attractive feature of a model in terms of reducing the amount of storage

required, it is considered here also in combination with 3) in the sense that the

parameters are more manageable if there are less of them. The synthesis capability of

the model, 4), may be derived from the analysis model and used by supplying it

modified, or artificial parameters, or it may exist on its own as a synthesis-only

technique.

It has also been assumed that the model will operate on a digital audio

representation so that it can operate within a computer. A more detailed diagram of

the sound modelling framework, then, is shown in Figure 2.2.

representation

parameters

modify etc.

operator

microphone loudspeaker

analysis synthesis

reconstructandamplify

1374158717451956

....

....

....

....

digital audio

time waveform

1374158717451956

....

....

....

....

digital audio

sample and quantise

sound

Figure 2.2 The sound modelling framework.

29

Now that a general modelling framework has been defined, the next section gives

some brief reviews of particular, well known representations that fit this description.

These serve to illustrate the points made so far, and act as a reference when the issue

of modelling sound using chaos theory is discussed in Chapter 4.

2.6. Conventional Models2.6.1. Physical Modelling

Physical modelling is a synthesis-only technique that is used to generate musical

sound from a computer representation of the physical system responsible for that

sound. The system can include the action of the musician on the instrument, and the

instrument itself. The system is usually partitioned according to physical, functional or

computational criteria which in fact often coincide. So for example, a violin may be

divided into the bow, strings, bridge and soundboard as separate coupled physical

systems; or into an excitation part (bow on string) that feeds a resonator (string,

bridge, sound board); or into a nonlinear oscillator (exciter) that is input to a linear

filter (resonator).

The appeal of physical modelling is that sounds may be created from a purely

theoretical basis and that the models and parameters are in a form that can be

intuitively understood by the user. The main disadvantage is that despite much basic

theory being known about the physics of musical sound generation, often the models

resulting from a direct implementation of the equations produce sounds that are flat

and lifeless [riss82]. This suggests that there are therefore many subtle aspects of

sound production that are important to the highly sensitive perceptual mechanisms of

the ear and brain that are not included in the basic theory. This is an area of current

research [cmj92].

2.6.2. Additive and Subtractive Synthesis

Additive and subtractive synthesis are terms used to cover a range of analysis-

synthesis techniques used for modelling musical instrument and voice sounds and

which rely on spectral representations of the time waveform. As mentioned above, a

number of such sounds can be presumed to be the product of some form of excitation

feeding a resonator. A time-varying spectral analysis of the sound can reveal these

components in a form that then suggests suitable further representations. For example,

such an analysis shows a bowed violin sound to consist of an approximately periodic

excitation, revealed as a set of harmonically related spectral lines, or partials, within

30

an overall spectral envelope, which is attributed to the resonances of the violin body.

A similar result can be found for voiced speech sounds, where the resonances, also

called formants, vary in time. Unvoiced speech sounds, however, show a broad-band

spectrum modulated by the formant envelope.

Additive synthesis seeks to regenerate the sound by adding together a set of

sinewaves whose frequency and amplitude 'trajectories' vary in time [serr90, riss82].

A diagram of this is shown in Figure 2.3. The trajectories are extracted from the

spectral analysis using a variety of methods. In this form, however, a large amount of

parameter data can be generated. It has been shown, however, that it is the overall

trend of the trajectories that is of greatest perceptual importance and their

approximation with simple piece-wise linear functions allows a considerable degree of

data reduction while maintaining the quality of the reproduced sound [grey75].

Modification of these functions then also allows musically interesting transformations

of the sound.

output

....

....

amp 1

+

freq 1

amp 2

freq 2

amp 3

freq 3

amp 4

freq 4

amp 5

freq 5

amp 6

freq 6

control

sinewave

generators

trajectories

Figure 2.3 A schematic diagram for additive synthesis.

Additive synthesis works well at representing certain sounds to a high degree of

perceptual accuracy. These are ones with a well defined partial structure arising from

periodic excitation and/or systems with simple vibrational modes. It is, however,

limited in its capability to represent complex or noisy sounds, i.e. ones with broad-

band spectral structures.

Subtractive synthesis also seeks to regenerate the sound using the spectral

information. It does this in the opposite sense to additive synthesis by starting with a

spectrally rich input that is then refined with a time varying filter. The excitation may

31

be periodic or noise-like, to give harmonic or wide-band spectral structure

respectively. The filter then shapes this to provide the formant envelope.

A powerful method for estimating suitable filters is linear prediction [makh75,

moor90]. This encompasses a number of techniques that allow the estimate of

parameters for a digital, recursive linear filter from the original time series. These

filters are of the form,

y x b yn n i n ii

M

1

where x is the excitation input, y the output, b the filter coefficients, and M the filter

order which corresponds to one half the number of formant peaks.

This technique is used widely for speech modelling where between 3-7 formants

are required to adequately represent the sound, and so provides a considerable degree

of data reduction. Attempts at modelling drum sounds suggest that approximately 100

are necessary [sand89]. This technique offers the potential for modification of the

individual resonances or the excitation so as to transform the sound in an intuitive

way. There are difficulties, however, associated with the numerical manipulation and

implementation of the high order filters required [sand92].

A much simplified synthesis-only derivative of the recursive filter model, known

as the Karplus-Strong algorithm, has been found to generate certain sounds very

effectively. These include plucked string, drum and electric guitar timbres [karp83,

jaff83, sull90]. The simplification is in having high order filter models, but with all

the coefficients set to zero except the higher index ones. Variants include the insertion

of other elements, for example randomly controlled switches and nonlinearities, in the

feedback path. It is therefore equivalently described as a delay-line with feedback via

some kind of modifier. Both these views are shown in Figure 2.4. Typically, the sound

is generated by inputting a burst of noise, or a simple periodic waveform to the delay

line.

modifier

output

z-1 z-1 z-1 z-1 z-1 z-1......

+

delay of samplesD

output

delay of samplesD

coefficients

input

input

Figure 2.4 Karplus-Strong algorithm. Top, simplified recursive linear filter andbottom, general delay-line view.

32

Finally, a technique for combining both additive and subtractive synthesis has also

been proposed [serr90].

2.6.3. Frequency Modulation and Waveshaping

Frequency modulation (FM) and waveshaping are related synthesis-only

techniques that allow the generation of sounds with complex line spectra using simple

models [chow73 and lebr79]. A basic unit of each technique is shown in Figure 2.5.

The units are then combined by either adding several outputs together, or nested so

that the output of one forms the input to another. The parameters inputted to the

model are accessed directly by the user, and/or controlled by simple functions to

generate time-varying sounds.

To their advantage, the sounds produced by these models are often approximate

replicas of musical ones. Both harmonic and inharmonic sounds may be simulated that

are like those generated from string or wind, and percussive instruments, respectively.

It is also possible to generate a wide range of abstract sounds. The relatively small

number of parameters involved allows for easy experimentation by the user and the

simplicity of the models enables them to be easily implemented.

+amp

freq

amp

freq

output

output

x(t) f [x(t)]f

nonlinearfunction

input

carrier frequency

modulationfrequencyandintensity output

amplitude

Figure 2.5 The basic units used within the FM (top) and waveshaping (bottom)synthesis techniques.

The disadvantages of these models are that no analysis methods exist that can

produce a set of parameters from a given sound and that, as with physical modelling,

the sounds can lack certain 'natural' qualities [moor90].

33

2.7. SummaryThis chapter has developed the concept of a model for sound with which to work.

The principal idea is that of representation. There are many levels on which a

representation for sound can take place, from the physical to the perceptual. Also,

several representations may be used together. An example is the chain of

representations that exists within the additive synthesis model: physical system;

pressure fluctuations at microphone; time waveform; digital audio time series; time-

varying spectrum; set of variable amplitude and frequency sinewaves; set of piece-

wise linear functions.

From a consideration of the types of creative applications where such a model

might be used, a functional description has been advanced. Central to this description

is the idea of a parameterised representation, where the parameters consist of less data

than the modelled sound and are of a form that facilitates manipulation of the sound in

useful ways.

Finally, several well known models fitting this description have been reviewed. These

models are primarily for music and speech sounds and, consequently, focus on

representing those elements that characterise such sounds, both physically and

perceptually, for example spectral lines and formant envelopes. The models, therefore,

concentrate mainly on the top two categories of Table 2.1. No models fitting the

description given in this chapter have been found in the literature which have been

found for sounds that are outside these categories.

34

Chapter 3

Chaos Theory and Fractal Geometry

3.1. IntroductionThis chapter presents an overview of chaos theory and fractal geometry. The

intention is to present a theoretical basis for the forthcoming chapters. Theory relevant

to each experimental chapter is then presented in that chapter. The emphasis is

therefore on the following subjects: the significance of chaos and fractals; strange

attractors; Iterated Function Systems; and several other relevant ideas and tools. The

chapter may be read in its entirety as a concise introduction to chaos and fractals, or

referred to as and when needed during later chapters. Sources for the general theory of

chaos and fractals include [stew92, farm90, glei87, laut88, deva89, schr91, peit88,

moon87, hao84, mand83, barn88].

Chaos theory is about a new understanding of dynamics, the way in which systems

behave through time. It concerns the realisation that deterministic systems which obey

fixed laws, can exhibit unpredictable behaviour. This runs contrary to the established

viewpoint, dating back to Newton, that the behaviour of deterministic systems can be

predicted for all future time. Also, chaotic behaviour, characterised by being irregular

and complex, may be found in very simple systems. This, again, apparently

contradicts the traditional scientific expectation that complex behaviour arises only in

complex systems.

The theory of fractals, however, provides a new understanding of geometry. It is

based on a realisation that there exists a large class of geometric objects not

encompassed by the traditional Euclidean geometry of points, lines and circles, or the

forms of differential calculus, for example smooth curves. Fractal objects have

properties unlike those of their traditional counterparts because of the way they fill

space. For example, they typically have dimensions which are not integers and curves

with infinite length can be contained within a finite volume. Many fractals have the

same form when viewed on different scales, a property known as self-similarity. Like

chaos, it is also possible to construct complex fractal forms using only simple rules.

Of greatest importance, perhaps, is that both chaos and fractals can accurately

represent naturally occurring phenomena. Advances in abstract theory have been

paralleled with discoveries of real-world phenomena which confirm the relevance and

usefulness of chaos and fractals. A selection of the subjects in which this has taken

place are: architecture, art, astrophysics, biology, chemistry, communications,

35

computing, data compression, economics, electronics, fluid dynamics, geology,

geophysics, linguistics, meteorology, music, physics, signal processing. See [glei87,

pick90, schr91, peit88, cril91, stew90 and moon87] and references therein.

3.2. The Significance of ChaosChaos theory concerns the dynamic behaviour of simple nonlinear systems.

Traditionally, the problem of dynamics has been approached in two different ways -

deterministic dynamics and stochastic processes. The deterministic approach assumes

that fixed laws govern the behaviour of a system. These laws may be written down

with linear differential equations, a solution found, and so the behaviour of the system

is known for all time. Such an approach applies to systems with a few degrees of

freedom and where linear relationships, or approximations, exist between the

component parts. The advantage to this approach is that the resulting solution gives

complete, predictive knowledge about the behaviour of the system. The main

disadvantage, however, lies also with the solution - it is not always possible to find

one. Analytic techniques do not provide a universal means of solution to systems of

differential equations, especially if they contain nonlinearities.

The alternative, stochastic, approach makes the assumption that the system under

investigation is too complex to be able to describe explicitly with fixed laws. This is

either because there are too many degrees of freedom, or it is not possible to measure

all the relevant aspects of the system. In this case, a partial description of the system

may be given using probability. That is, the degree of uncertainty about a system's

present state, or future behaviour may be quantified. Instead of describing the dynamic

behaviour of every degree of freedom with an explicit solution, only the likelihoods of

expected behaviour are known. These correspond to the average or typical behaviour

found by empirically accumulating information about the system. This is also a

powerful approach as, for example in thermodynamics, the average properties of

particles in a gas provides a useful description despite the exact behaviour of the

particles not being known.

Both these approaches have been maintained, side by side, in science for hundreds

of years. The deterministic description is assumed to be nearer to the true behaviour of

the system than the probabilistic one which is thought only to arise because of

ignorance about the system. It is also implied by these approaches that the degree of

complexity of the system relates to that of its behaviour. The analytic solution to a

dynamic system with few degrees of freedom is simple and regular. The complex

motion of particles in a fluid is assumed to be a consequence of the large number of

particles and their interactions.

36

The significance of chaos theory is that it is an understanding of dynamical

systems that combines elements of the two traditional approaches. In fact, a

deterministic chaotic system and a stochastic process can be indistinguishable to an

observer of the two systems.

Chaotic systems are deterministic and may be written down with explicit fixed

laws. They are, however, nonlinear and so, in general, no analytic solution can be

found. Instead the system may be explored by numerical integration with the help of a

computer. In fact, one of the reasons for the discovery of chaos can be attributed to the

availability of the computer.

Although chaos may occur in systems that are simple and deterministic, chaotic

behaviour is typically complex, irregular and unpredictable. The unpredictability of

chaotic behaviour does arise from ignorance about the system, but is not due to there

being too many degrees of freedom. It is ignorance about the exact state of the system,

a theoretical possibility, but practical difficulty, that manifests itself as

unpredictability of the system's future behaviour. The complexity and irregularity of a

chaotic system, however, is inherent to the system and has nothing to do with the

ignorance of an observer. Both in theory and practice, the complexity of chaotic

behaviour is manifest.

As well as being a revolution in scientific theory, chaos is significant for being

found to represent many naturally occurring dynamic processes. The list of subjects

given in the introduction to this chapter and the computer generated images shown in

the introduction to the thesis are all examples of this. It is the representation of

naturally occurring acoustic dynamics which concerns this work and will be discussed

further in the next chapter.

3.3. Dynamical Systems and State SpaceA general description of a dynamical system begins with its state, x . This could

be, for example, a scalar, vector or function that gives all the information about the

system, or all that is relevant, at any one time. For this work it is considered to be a

vector. The state may then be represented by a single point in state space (sometimes

known as phase space). For a system with d degrees of freedom, x will have d

components and the state space will be d-dimensional. The behaviour of the system is

then charted by the movement of the point in state space through time. The path it

takes is known as a trajectory, and in the discrete-time case it is also known as an

orbit. This is illustrated in Figure 3.1.

If the system is deterministic then the rules governing the temporal evolution of

the point in state space may be described by a single equation. In the continuous time

case,

37

txfdt

xd (3.1)

which describes a flow and in the discrete time case, tt xFx 1 (3.2)

which is a mapping. Because this work concerns modelling digital audio, the discrete

time version will be used.

state of systemtrajectory

evolution of state

state vector

each dimension = variable of system

Figure 3.1 State space representation of a dynamical system.

The function that defines the system, F, may be linear or nonlinear. It may also

depend on variables that change with time, but which change slowly relative to the

dynamics of the state, or that are changed between experiments. The variables that

define the function are, in this case, termed the parameters and are typically scalar

values.

A system for which d-dimensional hypervolumes in state space do not change

under the action of the system in time are termed conservative, whereas those for

which hypervolumes contract are called dissipative. This relates to a description of

physical systems in which energy is either conserved or dissipated, although the

terminology is also applied to abstract dynamical systems.

3.4. StabilityStability is a generic term for a system's response to perturbation. If a small

perturbation gives rise to a large change, the system is unstable. If the perturbation

dies away and has no long-term effect, the system is stable. Given two points in state

space, one a slightly perturbed version of the other, what happens to the two

subsequent trajectories into the immediate future? The perturbed trajectory might

38

converge to the original trajectory, or diverge. This is referred to as local stability at a

point. After an infinite amount of time, what happens to the two trajectories? Again,

the perturbed trajectory may return to the original, or the two trajectories may

continuously separate. This is a matter of asymptotic local stability, and is quantified

by the spectrum of Lyapunov exponents. These measure the rate at which an

infinitesimal d-dimensional ball in state space distorts, on average, under the effects of

the dynamical system mapping in state space. Since the ball is infinitesimal, it distorts

according to the linear part of the mapping, and hence distorts into an ellipsoid. The

Lyapunov exponents measure the rate of change of the principal axes of the ellipsoid

relative to the original ball. If the radius of the ith axis at time t is tri (3.3)

the Lyapunov exponents are defined as

0log

1limlim

00i

i

rti r

tr

ti

(3.4)

Notice that the limit of infinite time and the 1t term average the result over a

trajectory, and the other limit ensures an infinitesimal ball. For a d-dimensional state

space, there will be d Lyapunov exponents. A positive Lyapunov exponent indicates a

direction of instability, a value of zero, marginal stability, and a negative value,

stability. If the exponents are ordered according to size,

d 21 (3.5)

then lengths in state space change as tltl 1100 (3.6)

areas as tata 21100 (3.7)

and volumes as tvtv 321100 (3.8)

etc. The largest Lyapunov exponents reflects the asymptotic local stability of the

system. Its polarity indicates whether the distance between perturbed states is

increasing or decreasing and its value measures at what rate. The polarity also relates

to the presence, and type of attractor. This will be discussed in the next section. The

Lyapunov exponents also indicate the dissipative or conservative nature of the system.

The polarity of the sum of Lyapunov exponents measures the rate at which state space

hypervolumes change.

Sets in state space which are unaffected by the action of the dynamical system

mapping are termed invariant. The set B is invariant if

39

BBF t (3.9)

for all time t where ot indicates that the mapping F has been iterated t times. Invariant

sets also have associated stability. For example, if a point in an invariant set is

perturbed to be outside that set, does it return or move away? For example, a single

point which is invariant is termed a fixed point. A repelling fixed point is one which is

unstable, and an attracting fixed point is stable.

3.5. AttractorsAn attracting set is named because of the way it appears to "pull" nearby

trajectories towards it (it is stable) and then hold them there (it is invariant).

Conservative systems do not have attractors, but dissipative ones do. Since dissipative

systems shrink volumes at an exponential rate, in the limit of infinite time volumes are

shrunk to zero. Therefore for Equation (3.9) to hold for all time, attractors must be

sets of zero volume. The set of initial conditions in state space which are pulled

towards an attractor is termed the basin of attraction.

Attractors are important because they describe the typical long-term behaviour to

which a system settles after transients. They are useful because they allow dynamical

behaviour to be described with geometry. The geometry is not that of real physical

space, but of abstract state space. Dynamical systems theory has traditionally been

concerned with three types of attractors. These are known as regular attractors and

contain trajectories that are asymptotically locally stable (no positive Lyapunov

exponents). Regular attractors are objects in state space that have Euclidean geometry.

The three types of regular attractor are points, cycles and tori and are illustrated in

Figure 3.2.

Point Attractor

Limit Cycle

Torus

Trajectory tending

to attractor

Figure 3.2 Illustration of the three regular attractor types.

40

Point attractors correspond to states of rest and are typically found in dissipative

systems that have no input of energy. For example a damped pendulum that is

perturbed will settle to a rest state. Also, linear systems may only exhibit point

attractors if they have attractors at all. A point attractor has associated Lyapunov

exponents with polarity (-,-,-,...). Cycles (or limit cycles) have Lyapunov exponents of

the form (0,-,-,...). They correspond, for example, to sustained, stable oscillations in

nonlinearly driven linear resonators, such as clock mechanisms. If a clock pendulum is

slightly disturbed, it will eventually settle back to a regular oscillation. The geometry

of a limit cycle is that of a closed loop. The periodicity of the system's behaviour

corresponds to the return of the trajectory to the same point in state space and hence

the closed loop form of the attractor. Tori attractors in state space occur for systems

which display quasiperiodic behaviour. That is, oscillations that combine two or more

incommensurate frequencies. Tori attractors have associated Lyapunov exponents of

the form (0,0,-,...). Earlier views of turbulence in fluid systems, for example,

accounted for the development of turbulence as the cumulative addition of modes in

the motion, corresponding to tori attractors of increasing dimension [stew90].

3.6. ChaosChaos is a type of dynamic behaviour with properties unlike the regular motion

associated with the regular attractors of the previous section. It occurs in nonlinear,

dissipative systems. Chaotic behaviour corresponds to an attractor in state space that

typically has non-Euclidean, in fact fractal, geometry. Such attractors are termed

strange attractors. Trajectories on the attractor are always asymptotically locally

unstable (Lyapunov spectra of the form (+,-,-,...) for example). Trajectories on

attractors never meet up, and so the associated motion is not periodic. The motion is

typically irregular and complex despite the system having low order (small d).

For a linear system, asymptotic local instability implies that the resulting

behaviour will be unbounded and the state will "fly off" to infinity. For a chaotic

system, the asymptotic instability occurs in conjunction with the bounding nature of

the nonlinear mapping. Consequently, neighbouring trajectories in state space are

simultaneously pushed apart, but remain on or near the attractor. The former

behaviour is termed sensitive dependence on initial conditions. The main consequence

of it is that the motion becomes unpredictable in the long term. For regular motion,

any uncertainty in the position of an initial condition due to measurement inaccuracies

can be modelled as a slight perturbation to the actual state. Because of the asymptotic

local stability, the trajectories from the actual and errorful states remain together. The

small error in knowledge of the initial conditions remains a small error for all future

41

time and so the system is predictable. For a chaotic system, the small error

perturbation becomes magnified exponentially at a rate governed by the largest

positive Lyapunov exponent. The slight error in knowledge of the initial condition

grows with time. Consequently, chaotic systems are predictable in the short term, but

not the long term.

An example chaotic system, and one of the first to be studied, is the Lorenz model

of atmospheric convection [lore63]. This is written as a set of three nonlinear

differential equations,

nnn

nnnn

nn

bzyxz

zxyRxy

zyx

(3.10)

where x, y, and z form the state of the system and represent physical aspects of the

atmosphere, and

10 0 28 0 2 67. , . , .R b (3.11)

are the system parameters with values that give rise to chaotic behaviour. The set of

equations are analytically intractable, but may be numerically integrated.

Figure 3.3 shows the strange attractor of the Lorenz system. The figure is a 2-

dimensional projection of the 3-dimensional state spaced formed by the variables x,y

and z. Note the way in which the trajectory always passes nearby other trajectories, but

never joins them. In the limit of infinite time, an infinite length trajectory would be

contained within a finite volume. This is typical of fractal objects, and relates to the

self-similar nature of the banding illustrated in the Figure. Successive magnifications

of the trajectory on the attractor reveal the same form of detail on each scale. Figure

3.4 illustrates sensitive dependence on initial conditions by showing one of the state

variables of the Lorenz system for two simulations with similar initial conditions. The

resulting waveforms can be seen to be similar in the short term, but dissimilar in the

long term after divergence. This illustrates how long term prediction of a chaotic

system is impossible for limited knowledge of initial conditions.

3.7. Phase PortraitsAn important and often used technique in work on dynamics and especially chaos is

that of 'time-delayed embedding'. In the Lorenz example of the previous section, the

state space attractor can be shown directly because the state variables x,y and z are

directly available. It is often desirable to view the state space attractor to determine its

form, look for closed loops or fractal structure. Often, however, there is no direct

access to the state variables and instead there is only access to one observed variable

as it changes with time. Under certain conditions, however, the state of the system can

42

be reconstructed by forming vectors of time-delayed observations. This is discussed in

greater detail in Chapter 7 when it will be necessary to be more specific about the

embedding process.

A visualisation technique, known as a phase portrait, uses embedding to

reconstruct a topologically equivalent version of a system's attractor from a time series

of observations. This is done simply by plotting the time-series against a delayed

version of itself to form a 2-dimensional projection of the attractor. Or, three time-

delayed values can be used to construct a simulated 3-dimensional view. This has

been implemented in a program written for this work called PHS which allows phase

portraits to be constructed from a time series. An important variable which relates to

the quality of the phase portrait is the size of the time-delay. This is measured in

multiples of the sampling period and can range from 1 to 100. Figure 3.5 shows the

effect on the phase portrait of different values of the time-delay. This example has

been constructed from a time-series derived from the Lorenz system shown in the

previous section. One of the Lorenz variables (x) has been recorded to form the

observed time series. Note that if the delay is too small, there is little difference

between consecutive values of the time series and something close to a straight line is

plotted. At the other extreme, if the delay is too large, consecutive values become

unrelated and the phase portrait becomes messy. Note also the similarity of the phase

portrait topology to the topology of the strange attractor shown in Figure 3.3.

43

1

2

3

Strange attractorof Lorenz system

magnification of box in2

magnification of box in1

Figure 3.3 Sequence of magnifications of the Lorenz attractor showing its fractal,self-similar property.

Amplitudeof one variable

time

similar

exponential divergence

of Lorenz system

initial conditions

Figure 3.4 Two simulations of the Lorenz system for similar initial conditionsshowing sensitive dependence on initial conditions.

44

(a) (b)

(c)

Figure 3.5 Three phase portraits constructed from a time series of observations ofthe Lorenz chaotic system. Delay values are: (a) 1, (b) 10, (c) 100.

3.8. BifurcationAnother aspect of stability is the response to perturbations of a system's

parameters. This is termed structural stability. For changes in a system's parameters,

attractors may appear or disappear or change form. It is common for nonlinear systems

to display both regular and chaotic behaviour for different parameter values. Often

these relate through sequences of bifurcations. This is illustrated by the behaviour of

the simple, one-dimensional system known as the logistic mapping, 1,0,1,0,14 01 xxxx nnn (3.12)

The mapping is shown in Figure 3.6. The value of corresponds to the peak height of

the inverted parabola.

45

0

1

0 1

x

x n

n+1

Figure 3.6 The logistic mapping for 0 9. .

For small values of and any initial condition, 1,00 x , the iterated sequence of

states settle, after a transient, to a single fixed point. At a higher value of , the output

settles to a limit cycle consisting of an alternating sequence of two values. As continues to rise, the output again changes to a limit cycle of period four and then

eight etc. Such changes are termed bifurcations and this, in particular, is a period

doubling bifurcation. At a critical value of = 3.569..., the output becomes chaotic,

that is aperiodic and highly irregular. Further increases in take the form of the output

sequence through bands of chaotic behaviour with interspersed periodic 'windows'.

All this behaviour may be displayed with a bifurcation diagram. One is shown in

Figure 3.7 for the logistic mapping. Note that a magnification of a small portion on the

boundary of a periodic window and a neighbouring chaotic band reveals the self-

similarity of this structure.

46

Magnification of small rectangle

Possible

values

of xn

0.7 1

xn

n

Figure 3.7 Bifurcation diagram for the logistic mapping with corresponding timeseries plots.

The period doubling sequence, known as a route to chaos, is one of several ways

in which chaotic behaviour develops from regular behaviour [moon87].

47

3.9. Statistical Descriptions of DynamicsBecause of sensitive dependence on initial conditions and the limited accuracy to

which an initial condition can be known in practice, the exact long-term behaviour of

a chaotic system becomes unpredictable. It is therefore necessary to describe

dynamical behaviour of chaotic systems statistically. This allows for a general

description of typical trajectories and the determination of average behaviour. The

way this may be done is to consider the state vector as a random vector. Its behaviour

in time is then described by a multivariate stochastic process.

It may then be asked, for example, coming to a chaotic system for the first time:

what is the likelihood of finding the system in a given state? Since a chaotic system

possesses an attractor, it is therefore known that the state vector must be found

somewhere on this attractor,

x A (3.13)

The probability of finding the state in different parts of the attractor is then

described by a probability distribution function (pdf) whose support set is the

attractor. The probability of finding the state in some subset of state space, B, is then

xdxpBxPB (3.14)

where p( ) is the pdf. Alternatively, this may be expressed in terms of a probability

measure, , so that BBxP (3.15)

These are equivalent descriptions. The word measure is used to mean both the

probability distribution and the probability associated with a specific set. Note that

since the attractor is the support of the measure, 1A (3.16)

An invariant measure is one that is not effected by the action of the dynamical

system mapping. That is, BFB (3.17)

This is a statement of the conservation of probability as it says the probability of

finding a state in the subset B is the same as the probability of finding the state in the

subset to which B is mapped by F. A consequence of the measure being invariant is

that the random process representing the system behaviour is stationary. The

distribution of states over the attractor, i.e. the measure, is independent of time.

The measure may be used to calculate the expected value of a function of the state.

In general,

xdxfxfE (3.18)

48

for some function f. The state has no absolute value, only a possible one, hence the

integration over its probability distribution. A measure is said to be ergodic if the

expected value in Equation (3.18) may be replaced with a time average so that

dttxfT

txfET

T

0

1lim

(3.19)

A measure that is ergodic may be approximated by the relative frequency with

which a single trajectory visits various parts of the attractor. This may be done in

practice by dividing the attractor into bins and counting the number of times the state

falls within each bin whilst following a single trajectory. This defines a histogram

approximation to the measure. The measure of each bin is given, approximately by,

NN

nBxPB i

ii largefor(3.20)

where ni is the count in the i th bin, Bi and N is the total count over all bins.

In theory, chaotic systems have many invariant measures associated with invariant

sets in state space. These sets may be reached if the system is given certain exact

initial conditions. They are, however, not stable and are therefore not attractors.

Consequently, in practice, a real trajectory never maps out these invariant sets nor has

the corresponding invariant probability measures. This is because of the presence of

noise (experimental or computational) perturbing the state away from the unstable

sets. The ergodic, invariant measure corresponding to a typical, real trajectory is

termed the natural measure. It is this, for example, that is being approximated by the

histogramming process performed on an actual or computational chaotic system.

3.10. Fractal GeometryIt has already been mentioned that, typically, a strange attractor is a fractal object.

This section outlines the theory of fractal geometry in general and discusses its

significance to chaotic dynamics.

The word fractal was coined by Benoit Mandelbrot from the Latin fractus meaning

irregular and fragmented. His specific definition is that a fractal is a set whose

Haussdorf-Besicovitch dimension, DH , is greater than its topological dimension, DT

[mand83]. To understand this definition, it is necessary to consider the relationship

between two geometric concepts, those of self-similarity and dimension.

An object is self-similar if there is a similarity between the whole object and its

component parts, viewed on different scales. This is also called scale invariance. For

example, a line section, square or cube are self-similar because they may be composed

of scaled-down versions of themselves. See Figure 3.9. The fractal "triadic Koch

curve" shown in Figure 3.8 can be seen to be composed of four, one-third sized Koch

49

curves. These are examples of exact self-similarity. The property extends, however, to

the case where the similarity on different scales is a statistical one. Consider, for

example, a coastline. It appears to have the same irregular form when viewed on maps

made to different scales. These last two examples of self-similarity, however, may be

distinguished from the first by considering their scaling properties in the context of

dimension.

A B

A B

Magnificationx3

Figure 3.8 The exactly self-similar, triadic Koch curve.

The term dimension has several meanings. It is the relationship between these

meanings on which the definition of fractal is based. Dimension is used to describe

the number of independent directions in a space, or the number of coordinates

required to specify a unique point. This is often termed Euclidean dimension, E, for

real spaces of the form R E . Objects within such spaces are then classified according

to their topological dimension, DT . For example, a point has DT =0, a curve DT =1, a

surface DT =2, and a volume DT =3. These regular, Euclidean forms all have integeric

topological dimensions. Another type of dimension may be defined by considering the

self-similar, or scaling properties of an object. Consider again the straight line section

and its self-similar property. More specifically, let the line be composed of N versions

of the original, each scaled by a factor r<1. If the line's original length is L, then the

combined length of the scaled down parts must equal this, so

N.rL=L (3.21)

therefore

Nr=1 (3.22)

See Figure 3.9. Next, consider a square surface. Again this can be divided into N

smaller squares each scaled by a factor r. In this case, if the length of a side of the

original square is L, by equality of areas,

. 22 LrLN (3.23)

50

and so

Nr 2 1 (3.24)

For a cube, the same argument leads to

Nr 3 1 (3.25)

A consideration of the self-similar properties of these objects therefore gives the

general relationship

Nr DS 1 (3.26)

where DS agrees with an intuitive notion of dimension and, in these cases, with the

topological dimension. DS is known as the similarity dimension [peit88].

Lr L

L

L

L

L

L

r L

r L

r Lr L

r L

Whole covered with N parts :

each scaled down by r = 1/N



2

3

In general - Nr = 1D

1-D

2-D

3-D

S

Figure 3.9 General formula for similarity dimension derived by inspection ofstandard Euclidean shapes.

What, however, does this approach yield for the triadic Koch curve? By inspection

of Figure 3.8, it can be seen that the Koch curve can be divided into N=4 versions ofitself, each scaled down by a factor r 1

3 . From Equation (3.8), the similarity

dimension is found to be,4 1

4

31 262

13.

log

log.

D

S

S

D

(3.27)

which is a non-integeric value, unlike those of the regular Euclidean objects.

51

The concept of a dimension relating to the scaling properties of a geometric object

may be expanded to apply generally by using the notion of measurement. Typically,

geometric measurement involves finding lengths, areas and volumes of objects. The

way these values scale with the size of the object relates to their dimension. For

example, the area of a circle scales according to the square of its radius, the volume of

a cube according to the third power of side length. The exponents of these power laws

equate, in these cases, to the topological dimensions of the objects. In general, define a

measurement function to be HDNrrM (3.28)

where r is the linear extent of a generalised 'ruler' and N is the number of such rulers

required to cover the object being measured. Note that N and r now have different

meanings than they did for the discussion of similarity dimension. For a curve, M is

the length and DH =1, for a surface, M is the area and DH =2 etc. As r tends to zero, M

becomes a more accurate measurement of these quantities. The value of the scaling

exponent, DH , may be viewed as the only one that allows M(r) to tend to a positive,

finite value as r tends to zero. If DH were any larger, then M(r) would tend to infinity,

any smaller and M(r) would tend to zero and the resulting measurement would be

nonsense. This approach defines DH as the Haussdorf-Besicovitch dimension

[schr91].

Approaching the Koch curve in this way, its length may be measured with

reference to its means of construction. Figure 3.10 shows how the Koch curve may be

constructed. A generator curve, labelled the original, is, at each stage of construction,

scaled and inserted in the place of each straight line segment. In the limit of infinite

iterations, the result is the Koch curve. Let the original curve be exactly covered with

4 rulers of length r. After each iteration, the number of line segments increases by a

factor of four, but their length decreases by a factor of 3. The length of the curvetherefore increases by a factor of 4

3 . In the limit, its length will therefore become

infinite. This 'monstrous' property is compounded by the fact that a closed Koch curve

surrounds a finite area despite having infinite perimeter. See Figure 3.11.

52

Original

1 iteration

2 iteration

st

nd

Figure 3.10 Iterative construction of the triadic Koch curve.

Figure 3.11 Area of closed Koch curve (dark grey) is within area of circle (lightgrey) showing that it is finite.

Now apply the Haussdorf dimension approach to the Koch curve. Is there a value

of DH that gives a stable, positive value of M as r tends to zero? The initial

measurement is made with N=4 rulers of length r. After the first iteration, there areN=16 rulers of length r

3 . For the measurement,

HDNrrM (3.29)

to remain constant,H

H

DD r

r

3164

(3.30)

and therefore

DH log

log

4

3 (3.31)

53

Which is nonintegeric and, in this case, the same as the similarity dimension.

Applying the measurement approach to a coastline also gives, empirically, the

same kind of result. Using cartographer's dividers to measure the length of a coastline,

it is found that it typically increases as the divider width is reduced [mand83]. Also,

there exists a definite power law relationship like Equation (3.9) with a Haussdorf

dimension of between 1 and 2, depending on the coastline measured.

It is the values of Haussdorf dimension, then, that distinguish between regular,

Euclidean objects, and those like the Koch curve and coastline. It is these latter two

that satisfy the definition

Haussdorf-Besicovitch dimension DH > topological dimension DT ,

and which are therefore described as fractals. The Haussdorf dimension in this case, is

often termed the fractal dimension. This description of fractal objects holds intuitively

as well. The Koch curve is still topologically a curve and so a topological dimension

of DT =1 applies. Because it has infinite length contained in a finite volume, however,

it fills space to a greater degree than a straight line and more like a surface. A fractal

dimension of 1<DH <2 is therefore descriptive of this property.

Returning to chaos theory, recall that strange attractors are typically fractal objects.

It is their fractal geometry which is significant to the type of dynamical behaviour that

chaotic systems display. The fact that a trajectory of a chaotic system may be confined

to a low-dimensional, bounded subset of state space (the attractor) but does not join

up with itself (the motion is irregular and nonperiodic) means that in the limit of

infinite time, an infinite length trajectory exists within a finite volume. This is exactly

the same kind of fractal property as that possessed by the Koch curve. The behaviour

of chaotic systems is typically complex despite the system's defining equation being

simple. Again this is reflected by the fractal nature of the strange attractor. As has

been shown by the simple iterative construction of the Koch curve, simple rules can

define complex fractal objects.

3.11. Iterated Function SystemsIterated Function Systems (IFS) is the name given to a scheme developed by

Barnsley and co-workers [barn88] for describing, generating, and manipulating a large

class of fractal objects. The scheme is theoretically well understood, and practically

robust making it an ideal environment for computer experimentation. An IFS may be

described in three equivalent ways: as a set of geometric operations, as a Markov

process, or as a chaotic system. Each of these models may define the same fractal

object, known as an attractor. This equivalence gives insight into chaos theory as it

links fractal geometry, deterministic chaos and random processes. Also, these three

54

models will feature in the forthcoming experimental chapters. The geometric

construction features in Chapters 5 and 6, the deterministic model in Chapter 7 and

the Markov model in Chapter 8. The background to these three views will be

discussed in turn in the following sections. Proofs and greater detail may be found in

[barn88].

3.11.1 Contraction MappingsAn IFS is defined as a set of contraction mappings acting on a metric space,

Nnwn ..1:; X (3.32)

Here, X, is the metric space with a metric, or distance function, defined between any

two points,

d(x,y) (3.33)

and the wn are the N contraction mappings. A contraction mapping is defined as one

whose action brings any pair of points closer together, yxdsywxwd nnn ,, (3.34)

for all x and y where 0 1 sn is the contractivity factor of the mapping. Typically,

the individual wn are simple affine mappings that combine to form one simple

nonlinear mapping, W,

W x w x w x w xN( ) ( ) ( ) ( ) 1 2U UL U (3.35)

orW x w xn

n

( ) ( )U (3.36)

The contractivity factor of W is the highest of the contractivity factors of the

individual mappings, Nnss n ..1:max (3.37)

An affine mapping comprises a linear transformation combined with a translation.

Figure 3.12 shows an example of three simple affine contraction mappings in the

metric space X=R 2 . Each mappings comprises a scaling by a factor of 0.5 followed

by a shift. Their combination, W, is also shown.

55

A w(A)

A W(A)

5.0

0

5.00

05.01 y

x

y

xw

5.0

5.0

5.00

05.02 y

x

y

xw

0

0

5.00

05.03 y

x

y

xw

y

xw

y

xw

y

xw

y

xW 321

Figure 3.12 Three affine contraction mappings on X=R 2 and their singlecombination, W.

If each mapping is contractive, then so too is their combination W. If this is the

case, the repeated application of W to an arbitrary initial set is guaranteed to converge,

in the limit, to the IFS attractor, A. That is,A W B

n

n

lim ( )o(3.38)

for some initial set, B. This is illustrated in Figure 3.13

B

W(B)

W (B)o2

W (B)o3

limit point,Athe attractor,

initial set

Figure 3.13 The repeated application of a contractive mapping, W, to some initialset B, tending to the limit set, or attractor, A.

56

As can be seen, in this case the attractor is a fractal object (a Sierpinski triangle).

This fractal attractor is invariant to the mapping, W,

A=W(A) (3.39)

This section has given the definition of an IFS and shown how a fractal object may

be constructed with the repeated application of a simple, nonlinear deterministic

mapping.

3.11.2 The Random Iteration AlgorithmIn the version of an IFS previously described, a fractal attractor is defined by a set

of simple contraction mappings. Their combined effect on a set - i.e. on a large

number of points in parallel - generates the attractor. If, however, the mappings are

taken individually and applied, one at a time, to a single point in the metric space, X, a

dynamical system is defined. The single point may then be considered the state of a

system, and X the state space. If the individual mappings are chosen at random, a

Markov process is defined. Let each mapping have an associated probability,

0 1 pn (3.40)

so that

pnn

N

1

1

(3.41)

Let the initial single point, or state, be

x0 X (3.42)

and the sequence generated by choosing a mapping at random according to the

associated probabilities be ini xwx 1 (3.43)

where n is chosen at random for each iteration. The generated sequence of points are

found to lie on the attractor of the IFS. This process has the name of the 'Random

Iteration Algorithm' (RIA) for generating IFS attractors. The RIA in fact describes a

Markov process which possesses an invariant, ergodic measure whose support set is

the attractor of the IFS, A.

Recall that a measure describes the probability distribution of states in state space

at any one time. Let be some such measure. A (discrete time, first order) Markov

process is defined by a Markov operator that determines the probability distribution at

the next future time step entirely from the distribution at the current time, nn M 1 (3.44)

For example, may be a probability (row) vector describing the distribution of a

discrete state system, and M may be a square matrix containing all the probabilities of

57

transfer from one state to another. In this case, multiplication of the vector, , by the

matrix, M, determines the distribution of states at the next time step.

An IFS with associated probabilities defines a Markov operator according to NN wpwpwpM 2211 (3.45)

And if the individual mappings are contractive, this possesses an invariant measure, ,

so that, M (3.46)

Because the measure is ergodic, the distribution of points along a typical trajectory

(i.e. sequence of iterations of the RIA) will approximate the invariant measure. The

RIA 'draws out', point at a time, the IFS attractor. The relative frequency with which

any part of the attractor is visited by the trajectory approximates the measure of that

part.

Figure 3.14 shows an example of the RIA in operation. The three mappings of the

IFS are the same as that used in the previous section and which define a Sierpinski

triangle. In this example, the associated probabilities are equal, so that the resulting

probability distribution over the attractor is uniform. Figure 3.15, however, shows the

result, after ~1000 iterations, when the same three mappings are used, but with

different associated probabilities. In each case, the sum of the probabilities is equal to

1, but the mappings are weighted unevenly.

(a) (b) (c)Figure 3.14 Example of Random Iteration Algorithm (RIA) in operation. The three

images show the results of iterating the Markov process, (a)~100, (b)~300, (c)~1000times.

(a) (b) (c)Figure 3.15 Examples of RIA attractors where the mappings are weighted with

different associated probabilities.

58

3.11.3 The Shift Dynamical SystemSo far, two different systems derived from the same set of mappings have been

shown to generate the same fractal attractor. Finally, there is a third formulation of an

IFS, called a Shift Dynamical System (SDS). An SDS is a deterministic nonlinear

dynamical system displaying chaotic behaviour and possessing a strange attractor. As

with the Markov process, the SDS acts on a single point in the metric space, which

may be interpreted as the state of a dynamical system. The mapping in this case,

however, is deterministic, nn xSx 1 (3.47)

with AwxxwxS nn when1

(3.48)

The system mapping, S, consists of a partition and the inverses of the individual

IFS mappings. Note that now the individual mappings are expansive because they are

the inverses of contractive ones and so will separate neighbouring initial conditions.

The system therefore displays sensitive dependence on initial conditions. The partition

is formed by the action of the individual contraction mappings, wn on A. This is

shown in Figure 3.16. Note that if the individual mappings are nonoverlapping, jiBwBw ji allfor (3.49)

then the partition will be formed with disjoint subsets. This is assumed to be the case

for a deterministic SDS.

w1

( )A w2

( )A

w3

( )A

Figure 3.16 Example of an IFS attractor partitioned into three disjoint subsetsaccording to the effect of the three individual contraction mappings on the attractor.

The mapping S is then equal to the inverse of one of the individual mappings

according to which partition set the state is in. In the case of affine w's, S is locally

59

linear, but overall is nonlinear. It is a fixed, deterministic rule applied at each iteration,

and the resulting dynamics can be shown to be chaotic [barn88]. The IFS attractor is

then the strange attractor of a chaotic system.

In subsections 3.11.1 to 3.11.3, three equivalent versions of an IFS have been

described. They each define the same fractal attractor, and in the case of the RIA and

the SDS, can be shown to have dynamics with the same statistics [barn88]. Therefore,

the geometric construction of a fractal, a Markov process and a deterministic chaotic

system are theoretically closely linked. Each of these three forms are simple, and

robust for computer implementation. Two further points about IFS will be of use in

the forthcoming experimental chapters of this thesis and are presented in the next two

subsections.

3.11.4 The Collage TheoremThe inverse problem for IFS is as follows: given a target set, T, find a set of IFS

mappings such that the IFS attractor they define is close to T.

The previous three subsections have shown ways to construct a complex fractal

attractor from a simple nonlinear system. The inverse problem describes the often

desired reverse process. Beginning with a complex object, is it possible to find a

simple system which can generate it? And if so, what is that system? If the target set is

to be approximated by an IFS attractor A, the collage theorem provides a useful error

criteria. It says that an IFS will generate an A that is close to T if the IFS mappings

applied to T form a close collage of T. The closeness between sets of a metric space

may be quantified by the Haussdorf metric, YXh , (3.50)

where X Y, X . A collage of the target set T with mapped versions of itself is

N

nn Tw

1

(3.51)

Let the closeness of this collage to the original be bounded, so that

N

nn TwTh

1

, (3.52)

The collage theorem then states that

s

ATh

1

, (3.53)

See [barn88]. That is, the closeness of the target set to the IFS attractor, and therefore

the measure of success of the inverse solution, is bounded according to the collage

60

error and the contractivity factor of the IFS. The better the collage and the higher the

contractivity of the mappings, the better the resulting solution to the inverse problem.

3.11.5. The Continuous Dependence of the Attractor on the IFSParameters

When the mappings of an IFS satisfy the contractivity requirement, the system

they define is structurally stable. Any small changes in the values of the mapping

parameters will corresponds to a small change in the form of the resulting attractor. In

fact there is a continuous relationship between the two. Any small error in the

mappings parameters will have only a small effect on the IFS attractor. Also, any

small change in the parameters may be used to effect a small change in the attractor.

3.12. SummaryThis chapter has presented an outline of chaos theory and fractal geometry which

will be relevant to the rest of this thesis. Chaos is a class of complex irregular

behaviour that can occur in simple nonlinear dissipative dynamical systems. It is

defined by sensitive dependence on initial conditions and positive Lyapunov

exponents, and is typically associated with fractal strange attractors existing in the

system's state space. Chaos theory is significant because it offers new interpretations

and models of complex and unpredictable behaviour that are somewhere between the

two traditional themes of deterministic dynamics and stochastic processes. In the

setting of deterministic dynamics, chaos corresponds to a state space attractor having a

different geometry to those of traditional regular attractors. This different geometry is

that of fractals in which dimensions that relate to the space filling properties of objects

have nonintegeric values. Viewed from a stochastic process perspective, chaotic

dynamics is equivalent to stationary behaviour. The probability distribution of states

over the fractal attractor is described by an invariant, ergodic measure. In physical or

computer experiments, this is termed the natural measure and is seen when following

a typical trajectory of the system.

A well understood and practically convenient scheme for manipulating simple

chaotic systems that possess complex fractal attractors is provided by Barnsley's IFS.

Three equivalent forms of an IFS relate fractal geometry, Markov processes and

deterministic chaos.

62

Chapter 4

Applying Chaos and Fractals to the Problem ofModelling Sound

Having outlined the main features and significance of chaos theory and fractal

geometry, this chapter considers its application to the problem of modelling sound.

The chapter begins with a discussion of the reasons for using chaos and fractals, and

speculation about the possibilities. This discussion concludes with two questions.

Firstly, a diagnostic one: is there any evidence for a connection between naturally

occurring sound and chaos or fractals? Secondly, a pragmatic one: how can chaos or

fractals be used to represent sound? The rest of the chapter is then devoted to

considering these issues so that a strategy of investigation for the experimental work

can be devised.

4.1. The Reasons for Using Chaos TheoryThe main reasons for applying chaos theory to the problem of modelling sound

come from considering the properties of chaos/fractals, the nature of certain sounds

and the current state of sound modelling techniques.

Consider the following three main features of chaos and fractals. Firstly, chaotic

systems and fractal objects are capable of representing and replicating a wide range of

natural phenomena. In Chapter 3, a number of subjects were listed in which chaotic

models are used to represent natural systems. And, in Chapter 1, the example was

given of computer generated fractal images that replicate natural objects such as

clouds and mountains (see Figure 1.1). Secondly, chaotic behaviour, which is complex

and irregular, may be generated from simple, nonlinear systems 'from the bottom up'.

That is, the complex behaviour emerges as a consequence of the simple nonlinear

rules and is not explicitly specified. This implies that certain complex phenomena may

be modelled with simple systems in this way. This is an alternative to the

methodology whereby complex behaviour is modelled with a complex system, the

detail of the original being explicitly specified, or imposed on the system 'from the top

down'. Thirdly, simple chaotic systems are capable of generating complex abstract

forms that are beautiful in their own right. Again, in Chapter 1, the example of the

Julia set was given. This highly intricate fractal set is the strange attractor of a very

simple system.

63

Now consider the nature of sound. Sound is a dynamic entity that may also be

complex and irregular. The main sources of sound around us are speech, music,

environmental sound and machine noise. Although both speech and music consist

mainly of regular, semiperiodic sounds, they do have a variety of irregular features.

For example, fricative speech sounds, such as 'ess' or 'eff', or the crash of a cymbal and

the breathy sound of a saxophone. Environmental and machine sounds are primarily

irregular and complex, for example, a burbling brook, splashing water, the roaring of

the wind, the rumble of thunder and the variety of screeching, scraping, buzzing and

humming noises made by machinery.

Is it, then, that chaotic dynamics exist in the systems that are responsible for these

types of sound? Do these sounds form an acoustic parallel to the natural images that

can be replicated, convincingly, with chaotic systems or fractal objects? Is it possible

to model the essential characteristics of complex sound 'from the bottom up' using

simple systems? Recall that a desired feature of a sound model was considered, in

Chapter 2, to be the relatively small amount of parameter data associated with the

representation of a sound. This is important for the reasons of data reduction and

manageability of the model. The prospect of modelling a complex, naturally occurring

sound with a simple system is therefore a desirable one. In any case, is it possible to

create new and beautiful abstract sound with chaotic systems and, for example,

generate an acoustic equivalent of the Julia set? These questions form the main

hypothesis of this work and are the issues that have inspired the investigation that is

presented.

Consider also the state of current sound models as described in Chapter 2. There

are two salient points. Firstly, the type of sound that these models concentrate on are

primarily regular ones, such as musical tones or speech. In fact, no models appear in

the literature for the large range of other, irregular sounds discussed above. Secondly,

the approach taken with some of these models mirrors the traditional approach taken

to problems of dynamics in general. That is, complex behaviour is modelled 'from the

top down' by complex systems and/or the systems used are linear. For example,

additive and subtractive synthesis and LPC consist of linear systems. Consider, in

particular, additive synthesis where a complex tone is modelled by describing each

component of the complex behaviour explicitly with an enveloped sinewave. The

complex detail is imposed on the model 'from the top down'. The resulting model is

always as complex as the tone it represents, the complexity not emerging from a

simpler system. On the other hand, physical modelling and F.M. synthesis are

examples of nonlinear models where complex musical tones do emerge from simpler

systems. For neither of these two, however, does there exist an analysis procedure that

64

can produce a model from a given sound. Also, both are models of regular musical

sounds.

So, a further aim of developing chaotic models is to complement existing models

in two respects. Firstly, by expanding the range of sound that may be modelled to

include irregular sounds and irregular aspects of primarily regular sounds. Secondly,

to approach the problem of modelling complex sound with the new methodology

whereby complexity is accounted for as emerging 'from the bottom up' in simple,

nonlinear systems.

Having considered the general reasons for, and possible advantages of modelling

sound using chaos theory, it becomes necessary to ask two questions. Firstly, a

diagnostic one: do any naturally occurring sounds have chaotic or fractal properties?

The second question is a practical one: how can chaos/fractals be used to represent

sound as part of a model?

In the following section the issue of diagnosis is discussed and a variety of

positive evidence is presented that suggests naturally occurring sound does have

chaotic and fractal properties. Following this is a discussion of the second question

which concludes this chapter with a strategy for experimental investigation.

4.2. Diagnosis of Chaotic BehaviourThe diagnosis of chaotic behaviour in a real system, or from a signal derived from

a system, is an area of ongoing research where there is still much debate about the

theory and algorithmic techniques used to identify chaos and their validity. For

example, see [farm90, gibi92, casd92 and vass]. The main approaches are based on

identification and characterisation of the two main features of chaotic dynamics:

fractal strange attractors and positive Lyapunov exponents. In both cases, algorithms

have been developed that estimate the dimension of the strange attractor, or the

associated largest Lyapunov exponents, from a signal derived from a chaotic system.

For example, see [moon87]. The basic premise is that if the attractor dimension is

nonintegeric, or the largest Lyapunov exponent positive, then chaotic behaviour is

present. Both these techniques, however, are computationally difficult to implement

and the results they provide are subject to a number of complicating factors. These

include the presence of noise in the data, the presence of transients (i.e. whether or not

the system has settled to an attractor in state space) the quantity of data and the effect

of processing the data, for example with band-limited measuring devices. There is still

much debate over the effect of these factors and how it is best to proceed to give

reliable results.

65

In practice, therefore, chaos is diagnosed with a combination of techniques and

some knowledge about the system under investigation, for example knowledge of the

existence of physical nonlinearities. As well as the two techniques mentioned, it is

common to use phase space visualisation to inspect an attractor to identify its form. It

is also common to identify the routes to chaos in a system, for example to investigate

the existence of period doubling bifurcation sequences as a parameter is varied. The

following subsections present two examples where there is evidence for chaotic

behaviour associated with systems that generate sound.

4.2.1. Chaos and Woodwind InstrumentsAs outlined in Chapter 2, the established physical model of many musical

instruments consists of a periodic excitation feeding a resonator. It has long been

established that the oscillations which form the excitation are the product of a

nonlinear dynamical system, while the resonator is a linear system. Lord Rayleigh

analysed the conditions of such self-sustained oscillations in nonlinear systems and

also identified the existence of a Hopf bifurcation at the threshold of oscillation

[rayl83]. As discussed in Chapter 3, bifurcations can occur in nonlinear dynamical

systems as some parameter of the system is varied. In the case of woodwind

instruments, for example, the main parameter is the blowing pressure at the

mouthpiece, controlled by the player. In woodwinds, the nonlinear system is formed

by the interaction of the reed in the mouthpiece, the incoming air source and the

reflected pressure waves coming from the resonant tube which forms the bulk of the

instrument [keef92]. Low blowing pressures are insufficient to excite the oscillation,

but as the pressure is slowly increased there comes a threshold point at which the

oscillation comes into being. This is an example of a Hopf bifurcation and its form on

a bifurcation diagram is shown in Figure 4.1

system parameter = blowing pressure

state of system

no sound oscillation = musical tone

=output pressure

Figure 4.1 Bifurcation diagram showing a Hopf bifurcation occurring at thethreshold of oscillation in a wind instrument as the blowing pressure is increased.

66

More recent research, [gibi88, gibi92 and lind88], has shown that if the blowing

pressure is increased further, a number of wind instruments, including the recorder,

oboe, clarinet and saxophone, exhibit sequences of period doubling bifurcations that

culminate with noisy chaos. This sequence can be diagnosed by inspection of the

phase portraits derived from the sound signals. In the regular mode of oscillation, the

periodic tone is seen as a simple closed loop attractor. The loop then acquires an extra

part, its length doubling with each period-doubling bifurcation. Chaos is then

exhibited by a complex, fractal attractor that does not form a closed loop.

Some of these unorthodox modes of oscillation used in modern music and by jazz

musicians and are known as 'multiphonic tones' [gibi92].

4.2.2. Chaos and GongsIt has also been suggested that chaos is responsible for sounds produced by other

types of musical instruments, in particular, gongs and cymbals [legg89]. A first

indication of the possibility of chaotic behaviour is that the sounds produced by these

instruments are typically complex and irregular despite the instruments themselves

having simple geometric forms. Crucial to the operation of these instruments is the

presence of geometric nonlinearities, for example rims, domes and dimples, in

otherwise regular forms. It has been suggested that these nonlinearities are responsible

for chaotic dynamics in the vibrations of the instruments which are, in turn,

responsible for the noisy, irregular type of sounds they produce. Experiments have

shown that bifurcation sequences occur when such instruments are excited by

external, sinusoidal vibrations of increasing intensity.

Both these examples suggest that chaotic dynamics are responsible for certain

types of sound produced by musical instruments. Other suggestions have included the

unvoiced aspects of speech sounds [mara91], for example fricatives, where turbulent

air flows are responsible for the noise. However, no published accounts of

experimental verification have been found. As well as evidence for chaotic dynamics

being responsible for certain sounds, there is also evidence for the existence of other

fractal properties.

4.2.3. Fractal Time WaveformsThe most obvious place to look for other fractal properties of sound apart from the

strange attractor is in the time domain waveform of the sound. As a geometric object,

however, this has a subtle but significant difference to the types of objects discussed

67

in Chapter 3 in the section on fractal geometry. Because a sound waveform is a

depiction of amplitude as a function of time, there is no natural relationship between

the scales of the two axes. The amplitude or time scales may be set independently of

one other. But the fractal objects considered in Chapter 3, such as the Koch curve and

coastline, are examples of objects in spaces where all directions are equivalent. All the

spatial dimensions are inherently linked and so do not scale independently, but

together. Such objects are self-similar if there is similarity between the whole object

and scaled versions. The scaling is specified by a single scale factor applied to all

spatial dimensions. For time domain waveforms, however, two scaling factors are

required, one for the time scale, and the other for amplitude. If there is similarity

between the scaled version of the waveform and the whole, this is termed 'self-

affinity'.

To diagnose self-similarity and quantify it with a fractal dimension, it is necessary

to measure the object under investigation with 'rulers' of various length and see if the

relationship HDNrrM (4.1)

holds (see Equation (3.29)). Recall that this technique was used to diagnose the fractal

nature of coastlines. A number of methods and computer algorithms exist which

implement this idea, such as the box-counting method [voss88]. This idea, however,

only translates to the case of the self-affine waveform if the two independent scales of

time and amplitude are fixed in some way. The waveform can then be treated as a

geometric curve in a space where each direction is equivalent. The fixing between

scales is, however, arbitrary, and can effect the value of the fractal dimension

[voss88]. For this, and other reasons, see [farm90], there is still much debate over the

validity of results obtained with fractal dimension estimation techniques.

Despite this, there are results that assign fractal dimensions to the waveforms

derived from certain sounds. The type of sounds analysed are mostly speech, but

animal noises have also been used [pick90, mara91 and sene92]. The results suggest

that the time domain waveforms have self-affine properties which are related to the

nature of the sound. For example, in speech, fricative sounds have higher fractal

dimensions than voiced sounds. This relationship appear to parallel that found to exist

between fractal dimension and the texture of fractal images. It has been found that the

fractal dimension increases according to the degree of 'roughness' or 'wiggliness' of the

texture [voss88].

68

4.2.4. 1/f Noise

My own investigations have also produced evidence for the existence of fractal

sound waveforms by showing that wind sound and roomtones are examples of 1/f

noise. 1/f noise is the term used to describe signals whose power spectral densities are

of the form

ffS

1 (4.2)

[voss88] over several decades of frequency, where5.1~~5.0 . (4.3)

1/f noises are signals that are statistically self-affine. To see this, consider a time

domain signal, v(t), with Fourier Transform V(f), and power spectral density

2fVfS (4.4)

In general, if the time domain signal is scaled along the time axes by a (positive)

factor, , and the amplitude is scaled by a factor, , that is, tvtv (4.5)

then

fVfV

(4.6)

and

f

Sf

Vf

VfVfS2

22

2

222

(4.7)

i.e.

fSfS

2

2

(4.8)

If the signal is a 1/f noise and has power spectral density

ffS

1 (4.9)

and is then scaled by in amplitude and in time, the power spectral density will

change according to (4.8) giving,

ff f

111 222

2

(4.10)

If the scaling factors are chosen correctly, so that

22 1

(4.11)

then the power spectral density will therefore remain unchanged. A 1/f noise is

therefore invariant to changes in scale of the time domain waveform. Since power

spectral density is an average measure, the scale invariance, or self-affinity (time and

amplitude are scaled by different factors) is a statistical one.

69

Figure 4.2 shows an example time series plot and the spectral form of 1/f noise

compared with that of white noise and Brown noise (one-dimensional Brownian

motion, or integrated white noise) which are also noises with power spectral densities

of the form

ffS

1

(4.12)

but where =0 (white noise) and =2 (Brown noise).

1_f 0

11_

f

21_

f

log S(f)

log f

v(t)

t

log S(f)

log S(f)

log f

log f

v(t)

v(t)

t

t

white noise

1/f orpink noise

Brownianmotion

Figure 4.2 Time series plots and spectral density forms for 1/f noise comparedwith white noise and Brown noise.

1/f noise occurs naturally in a wide variety of situations, for example electronic

component noise, signals in nerve membranes, and time series derived from sunspot

activity, but a single model for its physical origins remains unknown [voss78]. More

relevant to this work is that music has been shown to be a 1/f noise. Specifically, the

fluctuations of audio power and instantaneous frequency of musical signals have a 1/f

power spectral density over the range of frequencies 10 104 Hz [voss78, hsu90 and

hsu91]. The 1/f noise, or fractal property, is therefore associated with the patterns or

fluctuations contained in the music itself, and not the sound that is heard since sounds

are fluctuations in the frequency range 10 104 Hz. The 1/f property occurs for music

taken from a wide range of cultures and historical periods. The 1/f property of music

is confirmed by constructing aleatoric music with the three noises shown in Figure

70

4.2. The music is made by mapping the noise signal to the pitch and timing parameters

of a sequence of notes. Most listeners will agree that the sequences of notes generated

with white noise are too random and uncorrelated to sound musical. Those generated

with Brown noise are the opposite; too correlated and varying too slowly in time. 1/f

noise is found to be somewhere in the middle of these and produces the most 'music-

like' fluctuations out of the three.

In [mand83] Mandelbrot relates the presence of 1/f noise in music to the

hierarchical temporal structure that it typically possesses. He also points out that the

same 1/f structure cannot be expected to be found in the sounds themselves as the

mechanism of production is different. This may be true for resonant musical

instruments, but is not the case for some environmental sounds.

Figure 4.3 shows the power spectral densities of two environmental sounds - wind

noise and a roomtone. The wind noise is taken from a BBC sound effects compact

disc and described as "blustery wind on a beach" and is sampled at 44.1KHz [bbc87].

The roomtone is taken from a library used for film soundtracks and described as

"industrial room tone, small, with ventilation noise" and is also sampled at 44.1kHz

[ssl89]. Both power spectral densities have been computed from approximately one

second of audio by averaging a sequence of 12 X 4096 point, non-overlapping, non-

windowed FFTs. The spectral density plots are displayed on log-log graphs to reveal

the 1/f relationship as a straight line. The gradient of the line relates to the exponent according to,

f

fS

ffS

log

log

1

(4.13)

The magnitude scale shown in the graphs is actually 20logS(f). Estimated

gradients give values of 1 5. for the wind noise and 1 3. for the roomtone.

Figure 4.3 Power spectral densities of wind noise (left) and an industrial roomtone(right) showing 1/f characteristic over the audible range of frequencies.

71

In the previous sections a variety of evidence has been presented for both the

occurrence of chaotic dynamics in systems that generate sound and the existence of

fractal sound waveforms. It can therefore be concluded that there is a physical basis to

support the idea of using chaos/fractals to model sound. Moreover, attempting to

model naturally occurring sound with chaotic/fractal models would itself be a rigorous

diagnostic test. A successful model, i.e. one that satisfied the criteria given in Chapter

2 of accurate representation and compactness, would confirm the sound to be of the

same type as the model. For example, recall that LPC modelling of speech is

successful because the structure of the model - excitation feeding a linear resonator -

matches the physical structure of the system generating the sound. Likewise, if a

chaotic system successfully models a sound, for example, then this is a strong reason

for supposing that the sound is a product of a chaotic system itself.

Having confirmed the physical relevance of chaotic/fractal models for sound, the

next section considers the practical problem of how exactly to represent sound using

chaos and fractals.

4.3. Representing Sound Using Chaos and FractalsAlthough the use of fractals and chaotic systems for algorithmic music

composition has received considerable attention - see for example [gogi91, pres88 and

jone88] - there is little published work in the literature on the idea of modelling sound

itself, and few experimental reports.

In [trua90], the author considers the idea of modelling sound with chaos to have

potential for a number of reasons. For example, because of its apparent relevance to

several unsolved problems in musical acoustics and its influence on science on the

arts in general. The author proposes a synthesis technique in which simple systems

known to chaos theory are used, arbitrarily, to control acoustic grains in the

time/frequency domain. He reports interesting sounding results, but no attempt is

made to model naturally occurring sound. Experiments with the same granular model

are also reported by the authors in [wasc90] who also point out what they see as a

problem with 'perceptual discontinuity'. This term describes the difference between

the way in which a fractal image may be perceived at all scales at once, whereas

differing time scales of sound are perceived in different ways as rhythm and timbre.

This issue will be discussed again in Chapter 5. Work reported in [rode93 and

maye93] is also on the subject of creating abstract synthetic sound with a known

chaotic system, in these cases the chaotic electronic circuit of Chua. In both cases, the

72

authors report the generation of classes of music-like sounds by converting one

variable of the chaotic system into an acoustic signal.

Although such accounts are relevant to the subject of this thesis, no accounts could

be found in the literature that consider the problem of an analysis/synthesis model for

sound. It is this subject that is discussed next.

How, then, can chaos theory be used to model sound? In other words, what is a

suitable representation for sound that can exploit the properties of chaos theory and

that is physically relevant to sound? The previous section has shown that there is some

evidence for chaotic systems being responsible for sound and that graphs of some

sound waveforms have fractal properties. This, then, suggests two approaches:

1) to represent the dynamics of the sound with a chaotic system, and

2) to represent the graph of a sound signal with a fractal object.

Recall from Chapter 3 that strange attractors embody the characteristics of chaotic

dynamics and may themselves be fractal objects. They are therefore suitable for

representing sound in both the forms described above. In the first case, assuming the

system generating the sound is chaotic, its dynamics may be represented with a

strange attractor. If this strange attractor is modelled, then effectively, so too is the

sound. In the second case, the sound is treated as a static waveform, or geometric

object. If this has fractal properties, it can be modelled with a fractal object, such as a

strange attractor. In both cases, the desire is to find a solution to the inverse problem.

That is, given a fractal object, find a system whose strange attractor matches it.

It is important to understand the difference between these two cases of the inverse

problem. In the first case, chaotic dynamics are assumed to be responsible for the

sound, and it is these which are to be modelled via the strange attractor. In the second

case, however, there is no assumption or modelling of chaotic dynamics. Instead, the

signal waveform, or graph, is being treated as a static object with fractal geometry and

modelled as such. Strange attractors may then be used as a source of fractal objects,

but the dynamics associated with them have nothing to do with the dynamics of the

sound. This situation is similar to that where fern leaves are replicated with IFS

strange attractors (see Figure 1.1). The dynamics associated with the IFS attractors

have no relevance to the fern leaf, since a fern leaf is not a chaotic system. It is,

however, a fractal object and it is this which is being replicated with a fractal strange

attractor.

Having concluded that strange attractors are suitable objects for representing

sound, it is further proposed that IFS are emphasised as the framework within which

to work. There are several reasons for using IFS which relate to their properties

73

outlined in Chapter 3. Firstly, they allow the generation of a large class of fractal

strange attractors with simple systems. Secondly, they are robust to computer

implementation. Thirdly, they are well understood and a number of theories about

them are of practical use. One of these is the collage theorem, which sets an error

criterion for the solution of the inverse problem (see Chapter 3). Another is the

property of continuous dependence of attractors on parameters. This is an important

property of IFS if the model is to be used to manipulate sound. It guarantees that,

while the IFS mappings are contractive, any small change in the mapping parameters

will effect only a small change in the IFS attractor. There is a continuous relationship

between the two. It would be undesirable if this was not the case, since a small change

in parameters, either intentional or due to errors in implementation, would effect a

drastic change in the behaviour of the model. To illustrate the property of continuous

dependence and again suggest the possible advantages of chaotic modelling of sound,

Figure 4.4 shows a set of IFS attractors, each of which has been generated by making

small changes to the parameters of the fern leaf IFS shown in Figure 1.1. It can be

seen how the small changes in the parameters result in small, but impressive,

transformations of the image.

This example also illustrates the power of modelling objects with strange

attractors. Not only are the complex IFS attractors defined by just 28 parameters,

making the model easily manageable, but the images may be easily and impressively

manipulated. Again, the question may be asked: could an acoustic equivalent of this

model be found?

Figure 4.4 A demonstration of the property of continuous dependence of IFSattractors on the parameters that define them. This also illustrates the power ofmanipulation capable with chaotic models [frac90].

74

4.4. SummaryThis chapter has discussed the idea of applying chaos theory to the problem of

modelling sound. This discussion has included the reasons and possible advantages of

using chaos theory, the physical evidence to support the idea and concluded with an

approach to the problem. This is to use strange attractors to represent sound in two

different ways and to emphasise the use of IFS. The next four chapters (Chapters 5-8)

present a number of experimental investigations that explore this idea. The first two

concern using Fractal Interpolation Functions (FIF) to model the waveforms of sound

signals. FIFs are a class of IFS whose attractors form continuous, single-valued

functions of one variable and are therefore ideal for representing signal waveforms.

Chapters 7 and 8 then present an exploration of dynamic models for chaotic sound.

Note then, that the approach labelled 2) on Page 72 is considered first.

75

Chapter 5

Fractal Interpolation Functions

This chapter presents the first of the experimental approaches to modelling sound

with strange attractors. It investigates the use of IFS as a synthesis-only technique. A

particular class of IFS are used called 'Fractal Interpolation Function' or 'FIF' whose

attractors are used to represent sound waveforms. The theory of FIFs is outlined, and a

sound synthesis algorithm developed. This is followed by a number of experimental

explorations which have two aims: to develop an understanding of the nature of FIFs

and the synthesis algorithm; and to search for interesting and potentially useful

sounds. This is followed by a number of advanced experiments where the basic FIF

algorithm is used as part of a more sophisticated model.

5.1. TheoryAn FIF is an IFS whose attractor has the special form of being a single-valued

function of one variable. An FIF is therefore defined on the plane, R 2 , and is an object

with the same geometry as the waveform of a sound signal. An FIF is constructed so

that, as its name suggests, a set of points is interpolated by a fractal function. FIFs

therefore complement the traditional non-fractal interpolating functions such as piece-

wise linear functions and polynomials. Also, complex fractal functions may be

constructed to interpolate only a few interpolation points.

The interpolation points, in combination with another set of values, uniquely

define the FIF as they define the mapping parameters of an IFS. The parameters that

define an FIF are a restricted class of affine mapping parameters on the plane. Let Nnyx nn 2,1,0:, 2 R (5.1)

be a set of interpolation points lying on the plane and let them be ordered such that

Nxxx 10 (5.2)

The points may then be interpolated by any continuous function of the form RNxxf ,: 0 (5.3)

such that nyxf nn allfor (5.4)

Now consider how the interpolation points define a set of contraction mappings.

For each consecutive pair of points, nnnn yxyx ,,, 11 (5.5)

76

a mapping can be defined that maps the end points, 00 , yx and NN yx , (5.6)

to this pair. By choosing this mapping to be an affine shear map of the form,

n

n

nn

nn f

e

y

x

dc

a

y

xw

0(5.7)

the parameters a c e fn n n n, , and may be derived from the constraints given by the

interpolation points. That is, the end points must map to the consecutive pair of

interpolation points

1

1

0

0

n

nn y

x

y

xw (5.8)

and

n

n

N

Nn y

x

y

xw (5.9)

for each n=1,2,...,N.

Solving (5.7), (5.8) and (5.9) gives,

0

1

xx

xxa

N

nnn

(5.10)

0

01

xx

yydyyc

N

Nnnnn

(5.11)

0

01

xx

xxxxe

N

nnNn

(5.12)

and 0

0001

xx

yxyxdyxyxf

N

NNnnnNn

(5.13)

while the dn remain variable. It can be seen that such a mapping is contractive in the

x- direction since

10 nnN xxxx (5.14)

It is also a shear transform, so that lines parallel to the y-axis remain so after beingmapped. The length of lines parallel to the y-axis, however, change by a factor dn .

The dn are therefore called the 'vertical scaling factors'. Limiting dn <1 ensures that

the mapping, wn, is contractive. Figure 5.1 shows an example of four interpolation

points and the effect of the three shear transforms that they define.

77

A

(x ,y )0 0 (x ,y )1 1

(x ,y )2 2

(x ,y )3 3w (A)1

w (A)2

w (A)3

x

y

h

d h1

heightof A

Figure 5.1 An example of the effect of three shear maps, w w w1 2 3, and on the areaA and an illustration of one of the vertical scaling factor, d1.

A set of interpolation points defines a set of the mappings, wn, each of which is

contractive. The set of mappings therefore define an IFS that is guaranteed to possess

an attractor. This is the main theorem of FIFs [barn88]: given the IFS, Nnwn ,,2,1,;2 R (5.15)

where the wn are derived from (5.10)-(5.13) and dn <1, then there exists an attractor,

G R 2 , that is a continuous interpolation function satisfying GNnyx nn ,,1,0:, (5.16)

This theorem is demonstrated in practice in the forthcoming section, 5.3, after a

synthesis algorithm has been developed to implement it.

The vertical scaling factors, dn , control the 'wiggliness' or 'roughness' of G that is

analogous to the way a 'depth of modulation' parameter controls amplitude

modulation, for example. If all the dn =0, the attractor, G, reduces to a non-fractalpiece-wise linear interpolation of the data points nn yx , . The dn in fact relate to the

fractal dimension, D, of G via

d an nD

n

N

1

1

1(5.17)

when

dnn

N

1

1

(5.18)

and the interpolation points do not all lie on a single straight line.

5.2. The Synthesis AlgorithmIn Chapter 3, three alternative forms of an IFS were described that define an

attractor from a set of mappings. Two of these forms are commonly used as the basis

for generation algorithms for displaying the IFS fractal attractors graphically. These

78

are the deterministic, geometric approach and the random iteration algorithm

[barn88]. Since FIFs are IFS attractors, either of these algorithms may be used to

generate the FIF from the interpolation points via the contraction mappings. For this

work, however, the desired result is not a fractal image, but a fractal waveform that

can be converted into sound. In particular, the waveform will be digital audio and

therefore a sequence of integeric values. For this reason, a new generation algorithm

has been devised which produces a FIF in the correct format to be converted directly

into sound.

The synthesis algorithm is a version of the deterministic IFS algorithm which

works in the two-dimensional discrete spaceIT (5.19)

which is a quantised approximation to R 2 , the space on which an IFS is defined.

I is the set of quantised amplitude values of the waveform: 1,2,,1, QQQQ I (5.20)

where 2Q is the number of quantisation levels and T is discrete time 12,1,0 TT (5.21)

where T is the length, in samples, of the waveform. The set of interpolation points are

now Nnyx nn 2,1,0:, IT (5.22)

with the added restrictions that1and00 Txx N (5.23)

so that the first and last interpolation points span the entire length of the waveform

which is of T samples.

The input to the synthesis algorithm is the set of interpolation points and the set of

vertical scaling factors. The algorithm consists of two parts. Firstly, the mapping

parameters are calculated from the interpolation points, then secondly, the mappings

are used to generate the FIF. The output of the algorithm is a discrete time, discrete

amplitude version of the FIF, G: 1,1,0: Ttzt I (5.24)

The FIF generation part of the algorithm is an implementation of Equation (3.38)

which indicates that the attractor may be obtained by iterating the combined

contraction mappings of the IFS, BWA i

i

lim (5.25)

for some arbitrary initial set B. For this, the FIF case, the combined mapping is the

union of the shear mappings,

N

nwW (5.26)

79

and the attractor is the sequence zn, which is the discrete time, discrete amplitude

approximation to the FIF G. The iteration of (5.25) is implemented by mapping the

contents of one array to another, back and forth, a finite, not infinite, number of times.

Let, 1,1,0: Ttut I (5.27)

be the contents of one array and 1,1,0: Ttvt I (5.28)

be the other and let i be the number of iterations. The synthesis algorithm consists of

the following steps:

Calculate the mapping parameters

Input the set of N+1 interpolation points, Nnyx nn 0:, , and the set of N

vertical scaling factors, Nndn 1:

From these, calculate the parameters of the N mappings, wn, that is the set ofparameters Nnfeca nnnn 1:,,, from Equations (5.10)-(5.13)

The deterministic algorithm

Initialise the array, un , with some arbitrary initial set B. e.g. let

u t Tt 0 0 1 for K

Use W to map the contents of the array ut , into the contents of the array vt .

That is, for each of the mappings wn, map each element of ut to the other array

according to n

t

nj e

u

tav

0(5.29)

where

n

t

nn f

u

tdcj int (5.30)

and int[.] returns the integer value.

Copy the contents of vt into ut

Iterate the above two steps i times.

Either array then contains an approximation to the FIF 1,1,0: TtzG t I and may be treated as digital audio and converted into

sound with a suitable digital-to-analogue converter.

80

5.3. Experiments with the Synthesis AlgorithmThe first set of experiments use simple patterns of interpolation points derived

from sinewaves to illustrate the way the synthesis algorithm works, and to begin to

understand the properties of FIFs.

Table 5.1 shows a set of 17 interpolation points derived from equally spaced

samples of a single cycle of a sinewave. Note that there are only 16 vertical scaling

factors as each pair of interpolation points defines a mapping. In this case T has been

set to 48,000 samples which makes the resulting FIF last one second as a sound since

the sampling rate used is 48kHz. Also, Q=32768, since 16-bit digital audio is being

used. Note also that, in general, no two x-values must be the same and all x-values

must be in order of ascending value.

interpolationpoints

vertical scaling vertical scaling

x value y value factor, d factor, d0 0 - -

3000 5740 0.5 0.056000 10607 0.5 0.19000 13858 0.5 0.15

12000 15000 0.5 0.215000 13858 0.5 0.2518000 10607 0.5 0.321000 5740 0.5 0.3524000 0 0.5 0.427000 -5740 0.5 0.4530000 -10607 0.5 0.533000 -13858 0.5 0.5536000 -15000 0.5 0.639000 -13858 0.5 0.6542000 -10607 0.5 0.745000 -5740 0.5 0.7548000 0 0.5 0.8

Table 5.1 (left) example set of interpolation points and vertical scaling factors thatdefine the FIF shown in Figure 5.2. Table 5.2 (right) vertical scaling factors used ingenerating Figure 5.3.

81

the initial set, in this case y=0 i=1

i=2 i=3

i=4 i=5

Figure 5.2 The initial arbitrary set, B, and a sequence of five iterations of thedeterministic algorithm.

Figure 5.2 shows the initial arbitrary set, B, and a sequence of iterations of the

deterministic algorithm. What is shown is the contents of one of the arrays at each

stage of the generation process. B is chosen to be the function y=0, in other words

each element of the array is made to be zero. The first iteration therefore results in the

interpolation points being interpolated by a piece-wise linear function. The remaining

waveforms show the sequence converging to the FIF which is the IFS attractor of the

mappings. Notice the small difference between the fourth and fifth iterates. Although

this example illustrates the FIF generation algorithm, the resulting sound is only a

simple regular tone and can be heard as Sound 1.

To see the effect of the vertical scaling factors on the FIF, Figure 5.3 shows the

result after 5 iterations using the same interpolation points, but where the vertical

scaling factors are those shown in Table 5.2. As the value of the vertical scaling factor

increases the 'depth of modulation' in the result increases. This FIF can be heard as

Sound 2.

82

Figure 5.3 FIF for equally spaced interpolation points derived from a single cycleof a sinewave, but where the vertical scaling factors increase for the mappings fromleft to right.

The sounds of these examples are regular because the interpolation points are

regularly spaced along the x-axis placed in a regular, ordered pattern. The next

experiment, however, also derives a set of interpolation points from a single cycle of a

sinewave, but such that the points are spaced in the x direction according to a square

law. This uneven spacing has a considerable effect on the resulting sound. The

interpolation point values are found from the simple rules,24 jx j (5.31)

and

65536

2sin j

j

xy

(5.32)

where1282,1,0 j (5.33)

is the interpolation point index. This gives a total of 129 interpolation points and an

FIF length of 65,536 samples. All vertical scaling factors are set to 0.9 to exaggerate

the modulation effect.

The first waveform of Figure 5.4 shows the resulting FIF after 3 iterations. As a

result of using unevenly spaced interpolation points, the corresponding sound of this

FIF is much more complex than that of the previous example. It consists of a complex

pulsing sequence of percussive tones. Both the tones fall in pitch, and the rhythm

slows down, a consequence of the same structure existing on different scales, but

being perceived differently. This can be heard as Sound 3. The nature of this sound is

a consequence of the self-similarity of the FIF. This 'acoustically fractal' property is

further illustrated by playing the sound at different speeds, for example with a

variable-speed tape player, a musical 'sampler', or by varying the playback sampling

rate of the digital-to-analogue converter. When playing the waveform at one octave

lower, or half the speed, the sound remains similar to the original: there is no change

in the perceived pitch although the sound lasts twice as long. This can be heard in

Sound 4 where the sound is played as a sequence of descending octave intervals.

83

The fractal property of this FIF can also be seen in the sequence of magnifications

shown in Figure 5.4. Notice how the third in this sequence of waveforms is similar to

the first.

(a) (b)

(c) (d)

Figure 5.4 FIF where x values are spaced according to a square law. Sequence ofmagnifications of windows is shown in (a)-(d).

This experiment also illustrates a particular problem encountered with the

synthesis algorithm due to the limited resolution of the array. The array is only a

discrete approximation to the continuous space required by IFS theory and therefore

limits the detail of the result. When one of the mappings maps the whole waveform to

in-between a pair of consecutive interpolation points, a sub-sampling process takes

place in the time domain. This occurs when Equation (5.30) is implemented. When

there is a large amount of detail in the whole waveform, an effect akin to aliasing

occurs so that the mapped version contains distortions. When these distortions are

compounded over a number of iterations, a significant amount of noise can be added

to the resulting FIF. This can be avoided by reducing the number of iterations to

generate partially self-similar FIFs, where there is a limit to the number of scales on

which detail is added. This is why the previous example was only iterated three times.

The effect of continuing to iterate the same mappings to i=6 is shown in Figure 5.5.

As can be seen, the result contains more irregular noise than that shown in Figure 5.4.

84

This problem also explains the irregularity shown in the detail of the third waveform

in Figure 5.4.

The next set of interpolation data is, in contrast to the ordered ones derived from

sinewaves, based on uniformly distributed pseudo-random numbers. Figure 5.6 shows

one example of the result when all parameters are randomised. The x values are scaled

in the range 0-T, where again T=48,000, and then sorted in ascending order of size.

The y values cover the whole amplitude range of -Q to Q-1 and the d cover the range -

1 to 1. The sound produced is difficult to describe, being rough and fragmented and

having a jolting, fluttering quality. It can be heard as Sound 5. If only the y values are

randomised, the x values evenly spaced and the d values made constant, the result is,

despite appearances (see Figure 5.7(a) ) predominantly tone-like with a background

that sounds like the wind. This is presented as Sound 6. This example again

emphasises the importance of the x values - their regularity dominates the resulting

sound. Finally, random y values are combined with x values following the square law

of Equation (5.31), producing a similar effect as with the sinewave of an apparently

falling pitch. In this case, there is no definite pitch, but a 'rushing' noise. This

waveform is shown in Figure 5.7(b) and can be heard as Sound 7. Notice the

similarity of the waveform plots for the last two examples, despite a large difference

in their sounds.

Figure 5.5 Same interpolation points as Figure 5.4, but with 6 iterations showingthe cumulative effect of errors in the algorithm. The bottom plot is a magnification ofthe middle ~1000 points of the top plot.

Figure 5.6 FIF generated from random x,y and d values for the interpolationpoints.

85

Figure 5.7 (a) (left) FIF generated with random y values, but evenly spaced x. All d= 0.9. (b) (right) FIF generated with random y, but square law x values. All d = 0.9.

This set of experiments with the FIF synthesis algorithm has shown a number of

things. They have illustrated the sequence of iterations that takes place in the

algorithm; shown how regularly spaced interpolation points produce regular sounds;

presented examples of complex waveforms generated from a relatively small number

of parameters; shown the fractal properties of such waveforms both visually and

acoustically; and shown a problem arising from using a discrete implementation

which has limited resolution.

5.4. Rhythm/TimbresThe FIF generated in the previous section that has audibly fractal properties

(Sound 3) is an example of a sound where both its rhythm and the timbre are the

expression of the same information. This information exists simultaneously on

different times scales which are then perceived differently. This one example suggests

that there exists a new class of fractal rhythm/timbre sounds which do not occur

naturally, nor have been generated with other synthesis techniques. Further

experimentation has confirmed that this is the case. Table 5.3 and Figure 5.8 show the

input parameters and a waveform plot of the resulting FIF whose sound is an abstract

percussive rhythm. This may be heard as Sound 8. The input data was the product of

heuristic experimentation with the placing of interpolation points.

The self-similar properties of this waveform can be clearly seen from the plot,

where the left-hand half appears on the right scaled by a half, quarter, eighth etc.

Again, this self-similarity can also be heard by playing the waveform at different

speeds, the overall quality of the sound remaining unchanged except for its length.

This can be heard in the sequence that comprises Sound 9.

86

Table 5.3 and Figure 5.8 Input data and waveform plot of the resulting FIF that isa rhythm/timbre.

Having confirmed that rhythm/timbres may easily be generated with a small

number of input parameters, the next set of experiments show how it is possible to

design such sounds by specifying the form of the rhythm first. The method used is to

fit interpolation points around a design for the desired macro-level rhythm, leaving the

FIF procedure to fill in the micro-level detail to provide the timbre. Figure 5.9(a)

shows how an original rhythm design, top, is made into a crude waveform with 18

interpolation points, middle. The function is the same for each of the three 'beats' with

vertical scaling factors chosen to decrease along the waveform to create a decay in the

sound. This is then processed by the FIF algorithm to produce the resulting sound,

bottom. This result can be heard as Sound 10. Figure 5.9(b) shows a similar

construction, where the end note has been moved to the first half of the bar. Also, a

less complex waveform is used, without such a harsh attack to the beats. The result is

quite different, emphasising the way in which the rhythm and timbre are highly related

via the self-similar property of the waveforms. The second rhythm can be heard as

Sound 11.

The rhythm/timbre sounds presented in this section, then, are examples of

complex, interesting and potentially useful musical sounds that may be easily

generated with the simple FIF model using only small amounts of data. It is believed

that they form a new class of previously unheard sounds. Also, they may be

constructed to approximate a desired rhythm and have a range of novel, unusual

timbres.

interpolationpoints

vertical scaling

x value y value factor, d0 0 -3 4 0.56 8 0.5

12 16 0.524 32 0.548 64 0.596 128 0.5

192 256 0.5384 512 0.5768 1024 0.5

1536 2048 0.53072 4096 0.56144 8192 0.5

10000 0 0.520000 0 0.5

87

xx x x x x

(a) (b)

Figure 5.9 Development of two rhythm/timbres from rhythmic design, top,through interpolation points, middle, to final waveform, bottom.

5.5. Generating Time-Varying FIF SoundsThe previous experiments have concentrated on generating a sound with a single

FIF. Typically, the FIF is ~50,000 samples long corresponding to a sound lasting ~1

second. An alternative is to create sounds from a combination of many shorter FIFs,

for example where each FIF is only hundreds of samples long and corresponds to a

single cycle of a tone. The principle behind the next few experiments is to form

sounds by concatenating a large number of short FIFs such that each consecutive FIF

is related by having similar parameters. By virtue of the 'continuous dependence of

attractors on parameters' property of IFS, a gradual variation of the parameters over

the duration of the sound will effect a gradual transformation of the sound. The result

is then a dynamic, time-varying sound whose properties are swept as the parameters

are swept.

The general algorithmic template for these experiments relies on iterating a simple

indexed loop. Within the loop, a set of FIF parameters are generated from a simple

rule that is a function of the loop index, and then an FIF is generated from these

parameters and appended to the composite sound.

88

The algorithm template

for index = start to end

calculate interpolation point/vertical scaling factor parameters as a

function of index

generate a short FIF with the parameters

accumulate generated samples in output file

next

The first example of this scheme is a sound made of 220 separate FIFs each being

220 samples long. The sound is then ~1 second long at the 48kHz sample rate used.

The FIF length was chosen to correspond to a fundamental period of ~220Hz, which

is one octave below middle A. Each FIF is generated from 4 interpolation points, the

first and last being fixed to y=0. The interpolation point control function then modifies

the middle two points in three ways. Firstly, it reduces their amplitude from full scale

to ~0 in an exponentially decreasing way. This is chosen to mimic the decay of a

percussive sound. The vertical scaling factors are likewise decreased to vary the

degree of modulation of the waveform from high to low. The x positions of the two

points are modified to cause some evolution of the spectral content of the waveform.

At all times, however, the y values are calculated to lie on a single cycle of a

sinewave. This control rule is shown in Figure 5.10.

for i=1 to 220i

iii ddd 99.09.0321 xi

0 0 xi3 220

x ii1 29 0 4 .

x ii2 191 0 4 .

yi0 0 yi

3 0

22201 1

sin99.09.015000 ixiiy

22202 2

sin99.09.015000 ixiiy

next

,y )(x1 1

(x ,y )2 2

(x ,y )3 3

(x ,y )0 0

Figure 5.10 Control rule for time-varying FIF sound. Left, pseudocode where j

ij

i yx , is the ith interpolation point of the jth FIF and dij is the vertical scaling factor

for the ith map of the jth FIF. Right, graphical depiction of the effect on theinterpolation points through time.

89

Figure 5.11 shows the resulting waveform of the complete sound including two

details of individual FIFs shown at the beginning and end of the control sequence.

Also shown is a spectrogram of the first half of the signal. The corresponding sound is

an interesting, complex percussive one which is the combination of a struck, damped

bell and a synthetic 'phasing' effect. This can be heard as Sound 12.

Figure 5.11 Left, time plot of the whole waveform generated with the control ruleshown in Figure 5.10 with selected magnifications of individual FIFs to show how thesound develops through time. Right, spectrogram of the first half of the soundshowing how it contains complex, time varying partials similar to those found innaturally occurring musical sounds.

The second example of this technique again uses FIFs generated with 4

interpolation points, but the control rule is slightly different. The end two points are

fixed, as before, but the middle pair are kept at constant y values with their x values

swept in opposite directions. This rule is shown in Figure 5.12. In this case 110 FIFs

were generated of 440 samples each, with all the vertical scaling factors set constant at

0.5. The resulting sound, again, has dynamic spectral properties being similar to that

of a complex tone passing through a swept band-pass filter. This can be heard as

Sound 13.

90

,y )(x1 1

(x ,y )2 2

(x ,y )3 3

(x ,y )0 0

Figure 5.12 Pictorial representation of the FIF parameter control used to generatethe second example of a time-varying FIF sound.

5.6. A Genetic Parameter Control InterfaceThe results from the previous set of experiments suggest that FIFs are capable of

generating many interesting, unusual and potentially useful synthetic sounds from

simple models. These results, however, comprise a very few selections from the

complete space of FIF sounds, or the corresponding space of FIF parameters. These

selections have been made somewhat arbitrarily, or with the intention of creating a

certain type of sound. Also, the sets of parameters have been generated one at a time

by entering values into a file by hand, or by constructing one-off 'C' programs to

calculate them. A useful development of this work would therefore be to devise a

user-friendly interface that would allow easier navigation through the space of FIF

sounds, and would provide faster feedback to the user of the sound corresponding to

the specified parameters.

In his book, 'The Blind Watchmaker', Richard Dawkins presents a simple

computer-based scheme for demonstrating the cumulative effects of small mutations

on the evolution of algorithmically represented organisms which he calls 'biomorphs'

[dawk88]. The biomorphs are recursively generated line graphics that may take on

complex forms, despite the generative algorithm being simple and it being defined by

only a small number of parameters. The user selects, at each generation, one of

several, slightly mutated variants of a biomorph which then 'survives' and is passed on

to the next generation. The user is acting as 'artificial selection' as opposed to 'natural

selection' and may select the surviving biomorph according to any criteria. In effect, a

small random perturbation is added to the parameters of a biomorph to create the

mutation. The repeated selection at each generation effects a connected path through

parameter space and hence biomorph space. With this scheme, Dawkins demonstrates

how complex, aesthetically appealing, or intended designs may be produced by the

iterated accumulation of small moves in parameter space.

91

The scheme has also been applied by two computer artists, William Latham and

Karl Sims, as a scheme for evolving complex computer images [lath91] and [sims91].

The images, like the biomorphs, are complex, procedurally generated structures. The

procedures are generally simple, each having a small set of associated parameters,

although the combined effect of many nested procedures is often complex. The

genetic parameter interface is shown to be a powerful tool in exploring the possible

forms that the algorithms can generate.

There is therefore a strong similarity between the biomorph model, the computer

art models and the FIF sound model. All of these attempt to produce aesthetically

interesting or desired results from simple models that generate complex forms. This

section, then, is dedicated to presenting a similar genetic parameter control scheme to

be used for the FIF synthesis algorithm.

5.6.1. Implementation

The genetic models of Dawkin's, Latham and Sims contain several common

components based on a model of biological evolution. These are: a population of

organisms each defined by a small set of parameters called their genotype; an

algorithm that expresses the genotype as a complex organism with its own distinctive

characteristics, or phenotype; a user interface that allows selection of either one

organism for mutation or two organisms for mating; and a procedure for

implementing the mutation or combination of genetic material. Such a scheme is

shown in Figure 5.13. These components have been implemented as part of a program

called GEN which runs on an IBM compatible PC and a Texas Instruments

TMS320C30 (C30) digital signal processor (DSP) chip. This hardware combination

provides the necessary interfaces and processing power to realise the GEN scheme.

The hardware organisation is shown in Figure 5.14. The DSP is needed to speed up

the generation of FIFs so that a population of FIFs may be generated in a time that is

acceptable to the user. The program used in the previous section to generate the FIFs

runs only on a PC (a 20MHz, 386 machine) and produces a one second sound in

approximately 30 seconds to 1 minute. Running on the C30 DSP, however, the

processing time is about 2-3 seconds. Therefore, a population of 10 FIFs may be

generated, viewed and heard in about 30 seconds using the DSP, and not in 5-10

minutes as it would if the DSP were not used. This order of magnitude difference is

crucial in making the GEN scheme a viable one.

92

( )( )( )( )

( )( )phenotypeorganism

genotypeparameters

genotypeparameters

genotypeparameters

genotypeparameters

genotypeparameters

... ...

population

artificial selection of 'fittest'

modification and/or

population of next

Iterate

combination of genes

generation

expression of phenotypefrom genotype withsynthesis algorithm

... ...( )phenotypeorganism

( )phenotypeorganism



Figure 5.13 Schematic diagram of the model for biological evolution.

PC

processorVDUand inputdevice

C30

DSP

card

digital

audio

output

DAT

player

maintains

input/outputfrom/to

evolutionaryenvironment

operator

and

fastgenerationofFIFs

generatesserialformatdigitalaudio

producessoundandstoresresults

Figure 5.14 Schematic diagram of hardware used for GEN program.

93

The evolutionary scheme shown in Figure 5.13 is implemented on the hardware as

follows. The population of organisms, in this case the FIFs, is displayed on the VDU

as a set of time domain waveforms. Simultaneously, as the waveforms are being

displayed, the FIFs may be heard as sounds via the audio output of the DAT. Artificial

selection takes place when the user chooses one of two options. Either a single FIF is

selected for mutation, or a pair of FIFs are selected for mating. In the mutation case, a

mutation factor is also chosen which controls the strength of the mutation. Once the

choice is made, the program running on the PC generates a new genotype from the

user's input information according to the mutation or mating algorithms. These

algorithms are explained below.

Mutation.

The parameters of the single chosen FIF are reproduced to form the basis of the

new population. A random number, whose size is a function of the mutation factor, is

added to each parameter. Let NidNiyx iii 1:,0:, (5.34)

be the interpolation points and vertical scaling factors of the chosen FIF. A new

population of size P, PjNidNiyx j

ij

ij

i 1:1:,0:, (5.35)

is then generated where,

xij

i rxx : i=1..N-1

y y rij

i y : i=1..N-1

d d rij

i d : i=1..N

(5.36)

Note that the first and last interpolation points are not modified which keeps thelength of the FIF equal at all times. r r rx y d, , and are random numbers with uniform

pdfs over the ranges

T

rT

x2 2

Q

rQ

y

1 1

rd

(5.37)

where T is the length of the FIFs in samples, 2Q is the number of amplitude

quantisation levels and is a divisor related to the mutation factor according to 29 (5.38)

where the mutation factor is an integer in the range 1-9. The range of mutation is

therefore an exponential one that corresponds to mutations whose effects are nearly

94

imperceptible (=1) to a complete randomisation of the parameters (=9). All

modified FIF parameters are maintained in their allowed ranges of

0 x T Q y Q 1

1 1d

(5.39)

using wraparound overflow. Also, after modification the interpolation points are

sorted to maintain the order of ascending x values. An example set of mutated

parameters is shown in Figure 5.Error! Bookmark not defined. (a).

interpolationpoints

verticalscaling

interpolationpoints

verticalscaling

x value y value factor, d x value y value factor, d0 0 - 0 0 -

3000 3830 0.5 becomes 3115 3190 0.566000 7070 0.5 5780 7210 0.51

... ... ... ... ... ...48000 0 0.5 48000 0 0.45

(a)

interpolationpoints

verticalscaling

x value y value factor, dA A -A A A interpolation

pointsverticalscaling

A A A x value y value factor, d... ... ... A BA A A B B A

combined with becomesA B B

interpolationpoints

verticalscaling

... ... ...

x value y value factor, d B A AB BB B BB B B... ... ...B B B

(b)

Figure 5.15 Example of mutation, (a), and recombination, (b), of FIF parameters.

95

Mating.

The mating algorithm does not modify the FIF parameters in the same way as the

mutation algorithm, but reuses the information in a new way. As with biological

genetic recombination, information from two genotypes is randomly mixed to produce

information for the new genotype. The operation of the algorithm is shown in Figure

5.Error! Bookmark not defined. (b). Again, the interpolation points are sorted after

recombination to maintain the correct order.

After the genetic material for the new population has been generated, the new

population of FIFs is itself generated by passing the parameters, one at a time, to the

DSP which runs the synthesis algorithm. The program has now returned to the

beginning state of Figure 5.13, the choice of organisms may be repeated and the cycle

iterated as many times as desired. It is also possible to initialise the FIF parameters

from a previously created file so that the evolutionary cycle may begin from a known

point.

5.6.2. Experiments

This section presents a number of experiments that illustrate the way in which the

genetic scheme works and demonstrates some example uses.

Figure 5.16 shows a single screen-shot from the program giving an example of

what is presented to the user. In this case, the population size is chosen to be 8, the

FIFs have 6 interpolation points and the synthesis algorithm is iterated 6 times. The

manifestation of the original parameter set, which was created arbitrarily, is shown as

waveform A, while waveforms B to H are mutated versions. In Sound 14 the

corresponding audio output of the program can be heard and consists of the eight FIFs

played in sequence. The similarity between the mutations is evident.

96

Figure 5.16 A single screen-shot from the program GEN.

Another example of the program in operation is shown in Figure 5.17. This figure

shows the screen shot over many generations. In this example, the parameter set is

initialised so that the x values of the interpolation points are evenly spaced and the y

values and vertical scaling factors, d, are zeroed. In each generation, a single FIF is

chosen and mutated with a mutation factor that was chosen to have quite a high value

of 7 out of 9. The chosen survivor at each generation is shown in more detail in Figure

5.18 and this sequence can be heard as Sound 15.

This example demonstrates the way in which a complex FIF waveform can be

developed from nothing by accumulating small changes in the parameters at each

generation. The similarity between each generation can be clearly seen and

demonstrates the structured exploration of FIF parameter space. In practice, it is found

that a good operational approach is to start with a high mutation factor, and end with a

low one. This permits large jumps in parameter space at the beginning of a session

allowing the exploration of a wide variety of possible FIFs. Once one of interest is

chosen, it can be developed, with a medium mutation factor, so as to keep the general

form, but explore the possible forms around it. Finally, a small mutation factor can be

used to make fine changes to produce the end result.

97

The GEN program has been used to develop the idea presented in the previous

section of concatenating a number of shorter FIFs to produce a time-varying FIF

sound. Starting from an interesting sounding FIF of approximately one third of a

second duration, an entire evolutionary sequence of FIFs have been concatenated to

form Sound 16. A variety of matings and mutations with a low mutation factor have

been used which ensures a gradual and even development of the sound.

The final two experiments use the GEN program as a means of modifying an

existing parameter set, instead of evolving one from nothing. An approximation to a

desired result is input initially and then the GEN program is used as a tool to modify

the sound within the parameter neighbourhood. In the first experiment, a

rhythm/timbre design made from 6 interpolation points is used as an initial input.

Also, a slight modification has been made to the program GEN. So as to keep the

timing of the rhythm design close to the original, the x values remain unchanged by

the mutation procedure. Four of the evolved results have been selected and then

concatenated to form a longer rhythm/timbre which can be heard as Sound 17.

98

Figure 5.17 A sequence of populations generated with the program GEN. In thiscase, the FIFs are produced from 6 interpolation points. At the start (waveform A - topleft) all interpolation points and vertical scaling factors are zeroed. At each stage, 7mutations are produced and then a single survivor is chosen by the operator (starredwaveform), which reappears as waveform A in the next generation.

99

Figure 5.18 Starting point (top left) and sequence of starred waveforms fromFigure 5.17 shown in more detail.

The second experiment illustrates a limitation of the genetic interface. A variant of

the set of FIF parameters described in Section 5.3 is used as initial input where the

interpolation points are derived from a sinewave and the x values follow a square law.

In this case, there are 46 interpolation points which is many times more than has been

used so far with the genetic scheme. After the first mutation, it is found that for low

100

mutation factors the resulting offspring are, perceptually, nearly indistinguishable

from one another. For medium to high factors, the structure of the original data set is

lost and so it is not possible to explore subtle variations of a desired FIF. For example,

Figure 5.19 shows the first generation of mutations for a low mutation factor of 3.

Each mutation sounds like a noisy version of the original FIF. These can be heard as a

sequence in Sound 18. In this example, the number of iterations for the FIF generation

algorithm has been kept to 2 so that the results can be seen clearly.

Despite this problem, it is still possible to evolve interesting FIFs which have large

numbers of parameters. It is, however, easier to create subtle variants of a given FIF

sound when it is defined by only a small number of parameters.

Figure 5.19 Mutated variants of an FIF that is defined by a relatively large numberof parameters. It can be seen (and heard) that when this is the case, low factormutations are found not to be distinctive from one another.

101

5.7. ConclusionsThis chapter has presented an investigation into using FIFs, a form of strange

attractor, as a sound synthesis technique. A variety of experiments and techniques

have been presented including a selection of the resulting sounds themselves. With

these experiments a number of the possibilities of FIF synthesis and some of the

problems have been demonstrated. A number of general conclusions can be drawn.

Most generally, these experiments demonstrate that there is an acoustic equivalent

to abstract fractal images. Even if the sounds generated with the FIF technique are not

as immediately stunning as many fractal images, they are interesting and unusual.

Also, only a small subset of all possible FIF sounds have been explored, and FIF

synthesis is only one technique of synthesising abstract fractal sounds.

The biggest advantage of FIF synthesis, which has been demonstrated by the

experiments, is that FIFs may produce complex sounds despite the model which

generates them being simple and requiring only a small number of parameters. It is

therefore possible to produce a range of interesting, unusual and potentially useful

sounds with a simple, easy to implement and manageable model. Many of the sounds

have qualities quite unlike those generated with other synthesis techniques. It is also

possible to use FIFs for producing musical sound, for example the bell-like tone of

Section 5.5.

Perhaps the most interesting and useful result has been the discovery of a new

class of sounds that are simultaneously rhythms and timbres. It has been found in this

case that the problem with fractal sound suggested by Waschka (see Section 4.3) of a

perceptual discontinuity in the acoustic domain is actually an advantage. Generating

both micro-level timbre and macro-level rhythm with the same structure on different

scales is believed to be a novel technique that has potential use for computer music

composition.

It was found, however, to be difficult to isolate those sounds of interest from the

large class of possible FIF sounds. This is similar to the situation found with IFS

images, [barn88], where the set of aesthetically pleasing or interesting images is very

small relative to the space of all IFS images and is widely scattered within it.

Generally, those sounds resulting from parameters that had a high degree of structure

produced the most interesting sounds. Also, a considerable degree of experimentation

and intuition is required to find the more interesting sounds.

Unlike the case of IFS images, no obvious replicas of naturally occurring sound

have been found so far which parallel something like the IFS fern images. Some FIF

102

sounds, however, do have elements of naturally occurring sound, for example they

contain certain echoey rumbles, or wind-like noises.

The genetic scheme was devised to allow a more ordered navigation through the

space of FIF sounds and is successful as a captivating, interactive piece of software.

With the genetic scheme it is possible to discover interesting FIF sounds without the

need to think about and provide a set of FIF parameters. It allows sounds to be

discovered by combining both an element of chance and an ordered process.

Finally, it is believed that there are many opportunities for further experimentation

with the FIF synthesis model. Any of the techniques presented may be concentrated on

with more time being spent exploring the sounds that can be generated.

103

Chapter 6

Modelling Sound with FIFs

In the previous chapter, a synthesis-only scheme was investigated where FIFs are

used as a source of abstract time domain waveforms that are converted into sound. In

this chapter, the focus is on using FIFs as part of an analysis/synthesis model for

representing naturally occurring sound. The chapter begins by exploring the

interpolation capabilities of FIFs by applying the synthesis technique of the previous

chapter to data derived from naturally occurring sound waveforms. The limitations of

this technique prompt a different approach to the modelling problem, which is to

consider the inverse problem for FIFs. Most of this chapter is then dedicated to

investigating a published algorithm which is claimed to be a solution of the inverse

problem. It is shown, however, that this algorithm does not solve the problem

satisfactorily. A modified version of this algorithm is then developed which does give

some successful results.

6.1. Deriving Interpolation Points from Naturally Occurring Sound Waveforms

Fractal waveforms, that are either exactly or statistically self-affine, have the

property that there is a relationship between the information present on different time

scales. This therefore implies that there is some kind of redundancy in the waveform

that could be exploited to compress the waveform within an analysis/synthesis model.

The information common to all scales need only be represented once by the model and

then reused on each scale to reconstruct the original.

FIFs generate all the scales of a waveform from the information provided by the

interpolation points which themselves define the coarsest time scale. This suggests the

possibility of reducing a naturally occurring fractal waveform to a set of interpolation

points which can then be used to reconstruct either the original waveform, or

something with similar fractal properties. The most obvious way to do this is to take

the interpolation points from the original waveform itself, in other words to sub-

sample it. The following set of experiments present an investigation of this idea where

an extract of wind noise is used as the original waveform since this has already been

shown, in Chapter 4, to have statistically self-affine properties. In all the following

experiments, samples are taken from the wind sound waveform and then used directly

104

as interpolation points for the FIF synthesis algorithm. The amplitude of the sample

becomes the y value of the interpolation point and the sample index is used to derive

the x value.

In the first experiment, the original waveform extract is 5,000 samples long and

the interpolation points are taken regularly every 50 samples. This corresponds to sub-

sampling by a factor of 100. The vertical scaling factors are set to be 0.3 for every

mapping, and the FIF synthesis algorithm is iterated 4 times. These figures have been

chosen so that the resulting FIF approximates, visually, the original waveform. The

original waveform, a piece-wise linear interpolation of the interpolation points and the

resulting FIF can all be seen in Figure 6.1. Despite the apparent similarity between the

original waveform and the FIF, a magnified view reveals that, not surprisingly, the

regular spacing in the x-direction of the interpolation points results in a regular, almost

periodic FIF. This can be seen in the magnified view where the same waveform

pattern is repeated between each pair of interpolation points.

Note that in all the waveform plots shown in this section, the time scale is marked

in 'points', meaning samples. Since the sample rate used to capture the original wind

noise was 48kHz, 48,000 points corresponds to one second.

(a) original waveform (b) piece-wise interpolation ofinterpolation points

(c) resulting FIF (d) magnification of small portion of(c)

Figure 6.1 Results of an experiment to extract interpolation points by decimating awind sound waveform and then constructing an FIF with them.

105

Because of the dependence of the resulting FIF on the x-spacing of the

interpolation points, information has to be extracted from the original of both

amplitude and time patterns. The next idea is to use amplitude zero-crossing and peak

values of the waveform to provide such information.

The interpolation points for the next experiment are chosen to be the points in the

original waveform whose amplitude is maximum in between its zero-crossing points.

This procedure produces well spaced points that capture the general shape of the

original. The original wind waveform and the extracted points, having been piece-

wise linearly interpolated, are shown in Figure 6.2.

Figure 6.2 Original wind sound waveform (top), interpolation of peak points(bottom left), and reconstructed waveform (bottom right).

Also shown in Figure 6.2 is an FIF constructed using the peak points as

interpolation points and where all vertical scaling factors are set to 0.2. Although the

amplitude of the sections that are mapped in between the interpolation points roughly

match that of the original, there appears to be a discrepancy in their shape, the FIF

having too much high frequency content. This is confirmed by listening to the

waveform which can be heard as Sound 19. The sound is also much rougher than the

softer, smoother sound of the original. There is, however, some element of similarity.

The excessive high frequency content occurs because the ratio of the x-distance

between the beginning and end interpolation point pair and the x-distance between any

106

two consecutive interpolation points is too high. Consequently, the whole waveform is

mapped to in between too small a space and the resulting detail is too fine.

The last experiment with this technique is an attempt to reduce the high frequency

content of the reconstructed waveform. The idea is to use the same interpolation

points derived from the peaks of the original waveform as above, but to divide them

into a number of sets and generate many shorter FIFs. Having control over the length

of the FIFs in this way then allows control over the ratio of interpolation point spacing

and hence over the degree of detail in the resulting FIF. The interpolation points from

the previous experiment are separated into groups of ten, with the last of one group

forming the first of the next. This ensures continuity between the consecutive FIFs so

they can be concatenated to form the complete resulting waveform. Figure 6.3 shows

the original waveform and the resulting FIF where it can be seen that there is a

stronger visual similarity than for the previous experiments. The resulting sound,

however, is a little disappointing. Although being closer to the original than the other

results, it is still unacceptably different from the original. This can be heard as Sound

20.

Figure 6.3 Section of original wind sound (left) and part of the composite FIF(right) constructed using groups of peak points.

The limited success of the approach taken in this section and also of the heuristic

nature of the investigation suggest that a more rigorous approach to the FIF modelling

problem needs to be taken. The desire is to find an FIF that best approximates a given

waveform and that is as simple, i.e. having a small number of parameters, as possible.

This is the inverse problem for FIFs. Recall that the inverse problem for IFS is, given

some set, find a set of contraction mappings that define an IFS attractor that is as close

as possible, in the Haussdorf sense, to the original set. There have been a number of

approaches to the IFS inverse problem for the case where the set to be approximated is

a two-dimensional quantised image. For example, see [mant89, barn91, fish92 and

107

mant92]. The results in [fish92] indicate that naturally occurring images may be

modelled with compression ratios of up to 70:1. The approaches taken to the IFS

inverse problem may be divided into two categories: search and optimisation. The

former approach involves systematically searching a subset of the space of all IFS

mapping parameters to find a set of that minimises the collage error (see Section

3.11.4.). The latter approach also seeks to minimise the collage error, but using

iterative optimisation techniques. It is not possible, however, to directly apply either

of these techniques to the FIF inverse problem because of the different form of data.

The IFS models use two-dimensional affine mappings to collage a two-dimensional

image. For the FIF model, two-dimensional shear mappings must collage one-

dimensional waveforms. The general approaches of search or optimisation can,

however, be appropriated and applied to the FIF inverse problem. Note that at this

stage of investigation, the aim is not necessarily to find an elegant, efficient algorithm

to solve the inverse problem, but to find out whether it is possible at all to solve the

FIF inverse problem for sound waveforms and model a naturally occurring sound with

a relatively simple FIF.

It was found that an algorithm exists in the literature that attempts to solve the FIF

inverse problem using a search technique. The rest of this chapter is concerned with

assessing this algorithm, applying it to the problem of modelling sound waveforms

and to improving its performance.

6.2. Mazel's Time Series ModelsMazel presents four time series models, and their associated inverse algorithms

[maze91] and [maze92]. The models are for general, discrete time series and are based

on FIFs. He calls them the self-affine, piece-wise self-affine, hidden variable and

piece-wise hidden variable fractal models. The self-affine version models a time series

with a single FIF in the same way as has been investigated in this and the last chapter.

The other three models are more complicated and use the recurrent and higher

dimensional variants of an FIF, see [barn88] and [barn89]. This section reviews the

results obtained with these models and their associated inverse algorithms as reported

by Mazel. A comparison of the results with the performance of amplitude

requantisation puts Mazel's models into context as compression algorithms.

Mazel presents results for each of his algorithms using a number of different time

series. The performance of the algorithms are measured by the degree of compression

obtained, and the resulting amount of degradation of the time series. The degree of

compression is measured as the ratio of the number of bits used to represent the

108

original time series to the number of bits required to represent the FIF parameters. In

most cases, the FIF parameters are quantised to the smallest number of bits such that

no further errors are introduced by the model. This is possible because Mazel shows

there is a threshold in the accuracy of parameter representation with respect to

degradation of the original time series [maze91]. The degradation is measured as a

signal to noise ratio (SNR). The SNR is defined as the power in the original signal

divided by the power of the error signal introduced by the model. The error signal is

the difference between the original signal and the FIF version. Let the original signal

(time series) be10: Ttxt (6.1)

and let the FIF version be10: Ttf t (6.2)

then the SNR is given by,

dBlog10

1

0

2

1

0

2

T

ttt

T

tt

fx

x

(6.3)

These measures of compression ratio and SNR quantify the performance of the

algorithm at compressing the original time series. The basic aim of successful

performance of a compression algorithm is to obtain the highest values of these

measures as possible.

Table 6.1 shows a summary of the results reported by Mazel for all four of his

models and a variety of signal types. The first thing that can be noticed is that the self-

affine model appears to perform substantially better than the other models. Mazel,

however, only presents this one result for the self-affine model and so it is not

possible to know from his experiments whether this is a freak occurrence, or an

example of the performance which is typically to be expected. By inspection of the

mountain profile data plotted in [maze91] it can be seen that approximately one third

of it is close to being a straight line. This might account for the good performance

relative to the other model/signal type combinations.

109

ModelType

TimeSeries

Number of bitsper sample of

original

Compression

Ratio

SNRin dB

self-affine mountain profile 9 22:1 35piece-wiseself-affine

well logging data 16 6.4:1 16.8

piece-wiseself-affine

ECG 12 5.3:1 23.2


seismic data 12 5.4:1 19.7


speech 16 6.4:1 16.9

hidden variable sunspot data 8 4.6:1 10.1hidden variable ECG 12 7.5:1 12.1

piece-wisehidden variable

seismic data 12 9.2:1 10.8


speech 16 5.6:1 15.5

Table 6.1 Summary of the results obtained by Mazel for his four FIF basedmodels/inverse algorithms [maze91 and 92].

It can also be seen that there is a general relationship between the SNR and

compression ratio such that as the compression ratio increases, the SNR decreases.

This is to be expected as more errors are likely to be generated as the signal is further

compressed. How, though, are these results to be interpreted? It would be revealing to

compare the relationship between compression ratio and SNR with those of other

compression schemes since Mazel does not do this himself. To put Mazel's results in

perspective, then, it is revealing to compare them with the theoretically expected

performance of an amplitude requantisation of the original time series.

6.3. Comparison with RequantisationAmplitude requantisation is, effectively, a simple and crude way of compressing a

time series by discarding some of the sample amplitude information. It therefore

provides a simple benchmark against which any compression scheme may be

compared. If the performance measures of a compression scheme are equal to or

worse than those of requantisation, then it would be just as effective, and easier, to

discard sample information to compress the signal. However, Mazel's original signal

data is unavailable to requantise so that a direct comparison with his FIF models

cannot be made. It is therefore necessary to make a comparison of the FIF models'

performance with the theoretically expected performance of requantisation.

110

The following analysis approximates the errors involved in the requantisation

process under certain general conditions so that an idea of the relationship between

compression and SNR can be obtained.

Begin with a general, complex, digital signal,10: Ttxt (6.4)

whose samples have been linearly quantised to r amplitude levels, or log2 r bits. For

example, consider the signal to be a general digital audio time series. Assume the

amplitude range is normalised so that1tx (6.5)

Let the amplitudes of the samples be requantised, by rounding, to q levels whereq<r (6.6)

The full original, amplitude range of 2 will be mapped onto the q levels of the

requantised signal and therefore original amplitude ranges of size2

q will be mapped

onto the individual quantisation levels. Let the requantised signal be xt . The

requantisation process will generate an error signal, t , wherex xt t t (6.7)

The maximum amplitude error, per sample, of the requantisation process will be of

magnitude,

1

q(6.8)

This is demonstrated in Figure 6.4.

Original

Signal amplitudes of :

Requantised

0

-1

1q levels

2/r

maximum

=1/qerror

r levels

mapping ofamplitude ranges

Figure 6.4 Mapping of amplitudes in requantisation process.

It is common in the digital signal processing literature to assume, under these

conditions, that the quantisation error signal will be a zero mean, uniformly

111

distributed, white noise process that is uncorrelated with the original signal [carl86].The amplitude probability distribution function of the error signal, tp , will be

constant over the range

qq t

11

(6.9)

and have a value ofq

2 so that the total area under the pdf is unity. The power of the

error signal is equal to its expected square value, or its variance. That is,

q

q

q

q

ttttt

qdpEP

1

1

1

1 32

322

1

3 2q

(6.10)

To estimate the maximum signal to noise ratio, SNRmax , of the original signal to

the requantisation noise signal, consider that the maximum power of the original is

limited to Px 1 because it has normalised amplitude. Therefore the upper bound on

the SNR is

dB3log1031 2

102

31

2

qqq

(6.11)

or, in terms of bits, b2

10max 2.3log10SNR dB

=4.77+6.02b dB (6.12)

whereq b 2 (6.13)

This gives an approximate relationship for the expected SNR for a simple,

rounding requantisation of the original time series. Note, however, that the result

obtained is an upper bound on the SNR. In practice, a signal might not have the

maximum power assumed, and so the SNR is likely to be smaller than that given by

this result. The effective compression of the requantistion process is simply given bylog

log2

2

r

q(6.14)

Now compare the performance of requantisation with that of one of Mazel's

algorithms. Take, for example, the case of using speech as the input time series for the

piece-wise self-affine model. According to the results shown in Table 6.1, a

compression of 6.4:1 is obtained with a corresponding degradation described by a

SNR of 16.9dB. Since the original signal was quantised to 16 bits, a requantisation

achieving the same compression would require 16/6.4=2.5 bits per sample. According

to Equation (6.12), an expected maximum SNR for the requantisation would be

4.77+6.02x2.5=19.8 dB (6.15)

112

This performance is of the same order, in fact better under the assumptions made, than

that obtained with Mazel's algorithm. This result suggests then, that at a first

inspection, Mazel's algorithm is no better than a simple requantisation of the original

time series.

To compare all of Mazel's results with the theoretical performance of

requantisation, Figure 6.5 shows graphs of SNR against compression ratio. Four

graphs are shown to account for the fact that the original time series used by Mazel are

originally quantised to different numbers of bits. As a result of this, although the

theoretical SNR remains the same when requantising to a certain number of bits (as

long as it is less than the original), the resulting compression ratio changes.

The performance of Mazel's algorithms relative to the theoretically derived

requantisation performance is indicated by the position of the SNR/compression data

pairs relative to the line. Pairs below the line indicate performance that is worse than

requantisation, on or near the line indicates similar performance, and above the line

indicates better performance. As can be seen, most of the results are worse, or only

slightly better, than the theoretically expected performance of requantisation. The

exceptional case is, as already mentioned, that of the mountain profile time series with

the self-affine model, whose performance is, by far, the best.

In order to try and confirm these findings, and to further experiment with Mazel's

techniques, the following section presents results for a reimplementation of one of

Mazel's algorithms using complex sounds as input.

113

Compression ratio

SNR

in dB

10

100

1 10

10

100

1 10

12

1

2

theoretically expectedrequantisationperformance

speech

piece-wise+

hidden variable

speech andwell-logging


+

Compression ratio

SNR

in dB

10

100

1 10

ECG


+ piece-wiseself-affine

+seismic

hidden variable

ECG+


+seismic


1

2

34

1 2

3

4

Compression ratio

SNR

in dB

1

10

100

1 10 100

mountain profile+self-affine


1 1

Compression ratio

SNR

in dB

10

100

1 10

sunspot data+hidden variable


1

1

Figure 6.5 Degradation against compression performance of Mazel's inversealgorithms for a variety of data and model types compared with the theoreticallyexpected performance of requantisation.

114

6.4. Mazel's Inverse Algorithm for the Self-Affine ModelThe self-affine model is chosen for reimplementation for several reasons. Most

importantly, this model is of the same form as that used for the experiments in the

previous chapter where a portion of sound waveform is represented by a single FIF.

Secondly, it is the simplest of Mazel's model/algorithm pairs and is therefore the

easiest to reimplement. Finally, as discussed in the review of Mazel's results, more

experiments are needed with the self-affine model to determine its typical behaviour.

Mazel's inverse algorithm for the self-affine model is a search technique that seeks

to find a set of FIF parameters given some time series. The parameters found are such

that they define an FIF attractor that approximates the original time series. The FIF

parameters consist of a set of interpolation points and vertical scaling factors. The

interpolation points are derived from the samples of the original time series and so the

search is for a smaller subset of the original samples that define the resulting FIF. The

search exploits the collage theorem to find this subset of samples.

Recall that the collage theorem allows an error criterion to be established for the

IFS inverse problem (see Section 3.11.4). Given some original set, the collage

theorem states that a collection of contraction mappings will define an IFS attractor

that is close to the original set if the mappings form a close collage of the original set.

A close collage is one where the difference, or error, between the original set and the

collage of that set is small. Mazel's algorithm applies the collage theorem by searching

for a set of interpolation points whose associated shear mappings form a good collage

of the original time series.

Let the original time series be the set of samples, 10: Ttut (6.16)

and let the interpolation points to be found be Nnyx nn 0:, (6.17)

These are restricted to be a subset of the original time series samples and so 10:,, Ttutyx tnn (6.18)

which in effect means that only the x positions of the interpolation points need to be

found since the y values are implied by the original time series sample values. It is

also necessary to find a set of vertical scaling factors, one for each consecutive pair of

interpolation points, Nndn 1: (6.19)

The main function of the algorithm is to test pairs of consecutive interpolation

points chosen from the original time series samples. To begin the search, the left-hand

point of a pair is fixed on the first sample of the original time series, i.e. 000 ,0, uyx (6.20)

115

The second, or right-hand, point of the pair is then tested at every value of t along the

original time series, except the very closest to the left-hand point, i.e. from121 Tx (6.21)

The closest point is not tested because a neighbouring pair of samples define a trivial

mapping. An example pair of interpolation points is shown in Figure 6.6

u

t

original time series

interpolation points

T-10

t

(x ,y )0 0(x ,y )1 1

Figure 6.6 First trial pair of interpolation points on the original time series graph.

mapped version oforiginal time series

u

t


T-10

t

(x ,y )0 0(x ,y )1 1

Figure 6.7 Mapping of whole time series to in between the first pair of

interpolation points.

Each test involves a calculation of the collage error for the piece of collage defined

by mapping the whole original time series waveform to in between the trial pair of

interpolation points. Each test consists of the following steps:

116

calculate a value of the vertical scaling factor for the pair of interpolation

points so that a shear mapping is defined,

apply the shear mapping to the whole time series to form a collage of the

portion of original time series between the pair of interpolation points. This is shown

in Figure 6.7.,

calculate the closeness of fit, i.e. the error, of this piece of collage

The result of each test is a single error value which is then temporarily stored. At

the end of the sequence of tests, a collage error is known for each possible position ofthe mobile right-hand interpolation point, 11 , yx . These errors are then compared so

as to determine which position of 11 , yx generated the lowest error and therefore the

best collage. The chosen position is then stored, with its associated vertical scaling

factor, as part of the resulting FIF parameter set.

The search sequence is then repeated, but the fixed point is made to be 11 , yx ,

and a new trial point, 22 , yx is introduced. This is then tested at every position along

the original time series, 12:,, 122 Txtutyx t (6.22)

Comparison of the resulting test errors yields another part of the final solution.

This routine is repeated until the last trial interpolation point is chosen to be at, or

near, the end of the time series. The result is then a set of interpolation points that

define mappings that form a collage of the original time series.

In the test procedure, the vertical scaling factor of the mapping associated with

each trial pair of interpolation points is calculated so that the maximum vertical extent

of the mapped original time series equals the maximum vertical extent of the original

time series between the pair of interpolation points. This is illustrated in Figure 6.8.

Although this method does not minimise the resulting collage error, it is used because

it is simple to implement and gives a good approximation to the optimal value. Mazel

experiments with other more complicated methods, but the corresponding results are

not significantly different [maze91].

117


mapped version oforiginal time series

part of

maximum vertical extents

(x ,y )n n

(x ,y )n+1 n+1

Figure 6.8 Maximum vertical extent of part of the original time series between apair of consecutive interpolation points and the maximum vertical extent of themapped original time series. The vertical scaling factor is calculated so as to makethese two extents equal.

The collage error for a piece of collage is calculated as the mean square difference

in amplitudes between mapped and original time series. Let 11 ,,, nnnn yxyx (6.23)

be an interpolation point pair with an associated shear mapping (see Section 5.1),

n

n

nn

nn f

e

dc

aw

0 (6.24)

where the vertical scaling factor, dn , is defined by setting the maximum vertical

extents equal. Let wn map the original time series, 10: Ttut (6.25)

into 1: nnt xxtv (6.26)

where n

t

nj e

u

tav

0 (6.27)

and

n

t

nn f

u

tdcj int

(6.28)

for t=0...T-1 (see Section 5.2). The collage error is then found from

1

2n

n

x

xtttn vu

(6.29)

What is convenient about Mazel's search algorithm is that no value for the number

of mappings, N, is chosen in advance. N is determined by the operation of the search

since it depends on the position of the interpolation points with the lowest collage

errors. As a result, however, only a partial search of all possible configurations of

interpolations points is carried out. That is, only a subset of the full parameter space is

118

explored. For example, although the position of 11 , yx is chosen to minimise the

collage error for the section between 00 , yx and 11 , yx , it is then fixed. This

precludes a solution where the error for this section may be suboptimal, but the overall

collage error will be lower. For more details on the operation of Mazel's algorithms,

see [maze91] and [maze92].

6.4.1. Initial Results

The algorithm, as outlined above, has been implemented so that sounds may be

used as the original time series. The algorithm has been given the additional capability

that a time series may be processed as a number of consecutive shorter sections. That

is, an original time series of length Ttot may be modelled as m separate time series of

length T with m concatenated FIFs so thatTtot =m.T (6.30)

This capability has been added for a number of reasons. Firstly, it allows the

performance of the model/algorithm to be evaluated as an average which is considered

to give a more reliable result than the performance for a single FIF. So, for example,

the performance of a T length FIF may be averaged over m sections of a time series

taken from the same source. Secondly it allows variation of the FIF length to see if

this variable effects the performance of the algorithm. Thirdly, it allows long sound

time series to be processed without prohibitive processing time.

To see this last point, consider the following analysis of the computational

processing time of the algorithm. Consider the worst case situation of the most

number of tests that the algorithm will have to evaluate for a given original time

series. Let T be the length of the original time series. The first interpolation point is

fixed at t=0 and the second is tested at t=2...T-1 giving T-2 tests. The worst case

giving the most number of tests is when the best collage is found to be the one for

which the second interpolation point is at t=2. The next sequence of tests for the third

interpolation point must then cover the values t=4...T-1 giving T-4 tests. If the testing

routines continue in this way so that the last sequence of tests is from t=T-3...T-1

giving 2 tests, then the total number of tests will be=(T-2)+(T-4)+(T-6)+...+4+2 (6.31)

So,

21

2

2

nn

T

(6.32)

which is a finite arithmetic series and so the expression for the total of number of tests

becomes

119

T T2 2 4

4 (6.33)

The computational processing time of the inverse algorithm is therefore of order 2TO . Processing a long sound time series as a number of shorter FIFs will therefore

require less computational time than processing it as a single long FIF.

The first experiment with the reimplementation of the algorithm is with a range of

different sound time series. Table 6.2 shows a summary of the resulting performance

figures. Because the original sound time series have been processed as a number of

shorter sections, each modelled as an individual FIF, the figures for compression ratio

and SNR give an average performance for the types of sound used.

original sound

time series

length of

resulting FIF

number ofshear

mappings used

compression

ratioSNR in dB

wind noise 996 368 1.28:1 35.4

filling bath 994 364 1.29:1 10.4

industrial

roomtone

995 397 1.18:1 41.3

river 996 395 1.19:1 19.8

gong 995 428 1.09:1 25.6

violin 993 467 1.0:1 31.8

Table 6.2 Summary of results for reimplementation of Mazel's algorithm for the

self-affine model. Each original time series of length Ttot has been processed as m=10

sections of length T=100.

number of

sections

section

length

length of

output

number of

mappings

compressio

n ratio

SNR

in dB

10 10 93 38 1.15:1 35.7

10 20 196 88 1.05:1 35.5

10 30 295 121 1.15:1 35.9

10 40 396 163 1.14:1 34.5

10 50 494 195 1.19:1 10.2

10 60 593 228 1.23:1 12.9

10 70 693 293 1.11:1 37.1

10 80 796 321 1.17:1 37.7

10 90 893 387 1.09:1 34.9

Table 6.3 Running algorithm with wind noise as original time series for a varietyof section lengths T.

120

Note that the algorithm often produces an FIF model of slightly fewer samples

than that asked for. For example, when asked to process 10x100 sample sections, the

algorithm produces an FIF model of only 996 samples. This is a consequence of the

algorithm choosing a trial interpolation point that is close to the end of the section of

original time series. Instead of then forcing another interpolation point to be at the

very end of the section, which will probably form a bad collage, the algorithm is

allowed to terminate.

One time series, that of wind noise, has also been processed with a variety of

section lengths to see if this has an effect on performance. These results are shown in

Table 6.3. All of the performance figures shown in the tables have been calculated in

the same way as by Mazel to enable direct comparison. In particular, it is assumed that

the interpolation points may be represented with 9 bits for both the x and y values,

whereas 16 bits are required for the vertical scaling factors. Since the original time

series are all quantised to 16 bits, the compression ratio is found from16

34

T

Ntot

tot(6.34)

where N tot is the total of number of mappings used and equal to the sum of Ns, the

number of mappings used per FIF section.

As can be seen from the tables, the results are of a very different nature compared

with Mazel's single result for the mountain profile. For this reimplementation there is

barely any compression at all, while the algorithm makes the time series deteriorate to

varying degrees. Changing the length of each section over the range shown does not

effect this result. By inspection of the operation of the algorithm, the problem appears

to be that the successful tests, when an interpolation point position is on trial, occur

for very low distances between the interpolation point pairs. The result is to generate

too many interpolation points which results in a low compression ratio. The choice of

interpolation points is determined by the closeness of the collage measured by the

collage error. The small distance between interpolation points therefore gives rise to

the lowest error. This, however, is a consequence of limited resolution. As discussed

in Chapter 5, when the whole time series is mapped to in between close interpolation

points there is effectively a severe sub-sampling. The error is then measured by

comparing a few points of the original with the few points of the mapped time series.

The detail of the mapped time series is therefore lost and does not contribute to the

error.

121

6.4.2. Error Weighting

A possible solution to this problem is to weight the error according to the distance

between the interpolation points so as to positively discriminate for those which are

further apart when a choice is made. Increasing this distance will increase the amount

of compression. This idea has been implemented by weighting the error as a function

of the trial interpolation point pair spacing. The weighting function is a linear one

constructed so that the error is decreased proportionally to the distance between the

interpolation point pair. The gradient of the linear weighting function may be varied

via a parameter which is normalised to operate in the range 0 1 . For =0, the

weighting is constant and equal to unity, and so the algorithm is no different from

Mazel's original. At the other end of the range, =1, the function is designed so that

for the greatest distance between the trial pair of interpolation points, the collage error

is weighted to be near zero, and therefore guaranteeing a high chance that that pair of

interpolation points will be chosen. Choices of in this range allows control over the

spacing of resulting interpolation points and therefore the degree of compression of

the algorithm. The error weighting function is illustrated in Figure 6.9.

The results shown in Table 6.4 and again in Figure 6.10 show how the

performance figures are effected by error weighting. Again, wind noise has been used

as the original time series.

t

end of originalposition of fixedinterpolation point time series

errorweighting

0

1

t=T-1t=xn

range of variableinterpolation point

Figure 6.9 Error weighting function parameterised by .

122

gradientparameter

length

of output

mappings

used

compression

ratio

SNR

in dB

0 994 421 1.11:1 35.4

0.1 999 397 1.19:1 29.2

0.2 999 348 1.35:1 19.5

0.3 999 308 1.53:1 14.6

0.4 999 242 1.94:1 9.5

0.5 999 207 2.27:1 8.9

0.6 999 167 2.82:1 7.5

0.7 999 132 3.56:1 6.84

0.8 999 102 4.6:1 5.67

0.9 999 77 6.14:1 3.46

1.0 999 55 8.56:1 1.06

Table 6.4 Results of error weighting the inverse algorithm for a range of weightingfunction gradients, . The original time series is wind noise and is processed as 10x100 sample sections.

compressionratio

0

1

2

3

4

5

6

7

8

9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

5

10

15

20

25

30

35

40

SNRin dB

compressionratio

SNR

Figure 6.10 Graph of the results shown in Table 6.4.

123

These results show that the compression ratio may indeed be controlled with the

parameter . It can also be seen that there exists the same kind of relationship between

compression ratio and signal to noise ratio where, as the amount of compression

increases, the quality of the resulting signal decreases.

To put these results in perspective, as was done with Mazel's results, it is

necessary to plot them along side the performance of requantisation. This was done

with Mazel's algorithms by plotting the theoretically expected performance of

requantistion. It is now also possible to plot the actual performance of requantising the

original sound time series, as the original data is now available. This provides a fairer

and more realistic comparison for the FIF inverse algorithm. It also allows a test of the

requantisation theory and a comparison between the actual and theoretically expected

performance. It is expected that the actual requantisation performance will be worse

than predicted, because the original signal does not contain the maximum power for

its dynamic range. As a result, the FIF model performance should appear improved

relative to requantisation.

Figure 6.11 shows a comparison of the theoretically expected and actual

requantisation performance with that of the error-weighted, self-affine model using

wind noise as input. The actual requantisation has been carried out by rounding the

sample amplitudes to simulate the process described in Section 6.3.

Compression ratio

SNR

in dB

1

10

100

1 10


actual requantisationperformance

performance of FIFinverse algorithmwith error weightingin the range

1

2

to 1 2

Figure 6.11 Comparison of performance between requantisation and error-weighted version of Mazel's algorithm. The original is 1000 samples of wind noisewhich is processed as 10x100 sample sections.

124

Two things can be seen from the graph shown in Figure 6.11. Firstly, the actual

requantisation performance is of the same order as that which is theoretically

expected. Also, as anticipated, the actual requantisation performance is slightly worse

due to the original time series not having the full power assumed in the theoretical

model. Secondly, the graph shows that the degree of compression relative to signal

degradation is no better than that achieved by requantisation. This result, in

conjunction with Mazel's original results and those obtained for the other sound time

series, therefore suggest that the FIF model is no model at all. There appears to be no

advantage to representing a time series with an FIF. The modelling process is

computationally costly and yields results that are worse than those obtained by simply

requantising the original signal.

6.4.3. Interpolation Point Range Restriction

By studying the working of the inverse algorithm with error weighting, it is found

that the poor performance is still related to the distance between the chosen

interpolation point pairs. The process of error weighting is intended to stop close

spacing of the interpolation points. Unfortunately, the effect is to continue to produce

closely spaced points, but to also produce a number of widely spaced points as well.

The combination of both close and widely spaced interpolation points produces poor

results. The close interpolation points decrease the compression ratio, while the

distant ones considerably reduce the SNR. What is needed is for the algorithm to

choose interpolation points that are more evenly spaced and lie somewhere in between

the two extremes described above.

To test this hypothesis, the algorithm has been further modified so as to restrict the

range of the trial interpolation point. That is, instead of allowing it to be tested for the

whole range of positions between the fixed interpolation point and the end of the time

series, it is tested within a chosen window of values. The restriction window is

described by its left- and right-hand positions, in samples, relative to the section of

original time series being processed. Let these positions be l and r respectively where2 3 l T (6.35)

andl r T 2 1 (6.36)

This fixed window, like error weighting, allows control over the distances

between the interpolation points and therefore over the degree of compression. Table

6.5 shows the performance figures where the original time series is wind noise and a

125

range of different window positions and lengths are used. These performance figures

are plotted alongside the requantisation performance in Figure 6.12

As can be seen from the graph, the effect of restricting interpolation point

positioning is to significantly improve the performance of the algorithm. Now the

performance is, in most cases, better than that of both the theoretically expected and

actual requantisation. For high compression ratios, the performance relative to actual

requantisation is considerably better. For example, requantising the wind noise to give

a compression ratio of 5.3:1 results in a SNR of 13.1dB. A similar degradation of

14.1dB produced by using the modified algorithm corresponds, however, to a

compression of 16.7:1. This is over three times the amount. For a visual comparison

of an extract of the original time series with the resulting FIF version, see Figure 6.13.

left of

window,

l

right of

window,

r

window

width

length of

output

mappings

used

compression

ratio

SNR

in dB

5 15 10 981 151 3.1:1 30.0

10 20 10 970 79 5.8:1 25.2

15 25 10 930 54 8.1:1 22.6

20 30 10 952 40 11.2:1 19.7

18 22 4 972 50 9.2:1 19.4

10 30 20 962 73 6.2:1 25.7

15 20 5 938 55 8.0:1 22.9

15 30 15 922 52 8.4:1 22.6

15 40 25 941 52 8.5:1 22.7

15 80 65 941 49 9.0:1 17.2

20 25 5 883 40 10.4:1 16.3

20 40 20 942 39 11.4 19.6

25 50 25 893 29 14.5:1 14.6

30 60 30 921 26 16.7:1 14.1

Table 6.5 Performance of modified FIF inverse algorithm with a specified windowrestricting the range of the trial interpolation point.

126

Compression ratio

1

10

100

1 10 10020

theoretically expectedperformance ofrequantisation

performance ofactualrequantisation

performance of FIFinverse algorithmwith interpolation pointposition restriction

SNRindB

Figure 6.12 Comparison of performance of the window restricted inversealgorithm with that of requantisation. The original time series is wind noise andprocessed as 10x100 sample sections.

Figure 6.13 Waveform plot of original wind noise (left) and compressed FIFversion (right) using the modified inverse algorithm. The compression ratio in thiscase is 8.1:1, and the SNR is 22.6dB

127

sound length ofoutput

mappingsused

compressionratio

SNR in dB

gong 943 52 8.5:1 1.9river 934 51 8.6:1 2.4

filling bath 956 52 8.7:1 0.9industrialroomtone

959 55 8.2:1 23.4

violin 961 57 7.9:1 10.0chattering

crowd950 55 8.1:1 8.4

city skyline 928 54 8.1:1 21.9Ecuadorianrainforest

948 51 8.8:1 2.3

laboratoryroomtone

953 53 8.5:1 15.5

audiencelaughter

935 55 8.0:1 12.4

seawash 932 54 8.1:1 8.4

Table 6.6 Table of performance figures for window restricted inverse algorithmusing a variety of sound time series. Each original time series is processed as 10x100sample sections and the restriction window is set at l=15 and r=25 samples.

0

5

10

15

20

25

gong river fillingbath

indust-rial

room-tone

violin chatter-ing

crowd

cityskyline

rain-forest

labor-atoryroom-tone

aud-ience

laughter

seawash windnoise

theoret-ical

requant-isation

SNR in dB Compression Ratio

Figure 6.14 Column chart showing the performance figures given in Table 6.6 fora variety of different original sound time series.

128

As a final experiment, the modified inverse algorithm is applied to a wide range of

other sound time series. The restriction window is fixed and chosen to give

approximately 8:1 compression. The degradations for each type of sound may then be

compared. The performance figures for this experiment are shown in Table 6.6 and

graphically in Figure 6.14 Although the restriction windows results in a similar

compression ratio for each sound, the associated degradation varies considerably. The

sounds which achieve the best performance and which give better

compression/degradation figures than those theoretically expected for requantisation

are wind noise and some other ambient environmental sounds such as the industrial

roomtone and the city skyline.

6.5. ConclusionsIn this chapter, the FIF inverse problem for sound time series has been considered.

That is, given an original sound, find a subset of its time series samples that, when

used as interpolation points, define an FIF that is close to the original. In the first

section of this chapter, a number of heuristic experiments were presented based on the

idea of extracting information from the coarsest scale of statistically self-affine time

series. This information was then used to specify the interpolation points that define

an FIF. The experiments with this technique, however, have not generated any FIFs

that sound enough like the original for the technique to be of any use. It was

concluded that a more rigorous technique is required to see if the FIF inverse problem

has any solutions for sound time series.

The rest of the chapter has then concentrated on a search technique found in the

literature which was devised by David Mazel. This, it has been claimed, solves the

FIF inverse problem by partially searching the space of subsets of original time series

samples. It has been shown, however, that the performance quoted for the inverse

algorithm, and for a number of other more complicated FIF-based models/inverse

algorithms, is no better than the theoretically expected performance of simple

amplitude requantisation. That is, for a given degree of compression, the signal

degradation introduced by the FIF model is greater than, or approximately equal to,

that introduced by requantising the sample amplitudes. The inverse algorithm has

been reimplemented and used to process a wide variety of sound time series and to

obtain average results. The results from this have confirmed the fact that Mazel's self-

affine model and inverse algorithm have poor performance relative to requantisation.

It has also been concluded that the good result reported by Mazel for the mountain

profile data is an exception and not typical of the FIF model/inverse algorithm.

129

The realisation that Mazel's models/algorithms are no better than requantisation

has prompted a number of experiments to modify the self-affine inverse algorithm. By

inspection of the workings of the algorithm, it was found that the poor performance is

a result of the uneven spacing of the chosen interpolation point pairs. A novel version

of the algorithm which eliminates this problem has been demonstrated which

produces some results that perform much better than amplitude requantisation. The

sound time series for which the best results have been obtained are those of wind

noise, industrial roomtone and a city skyline. Since the performance figures are

considerably better than for requantisation, it can be concluded that the FIF model is

suitable for these particular sounds and that there is something inherent to them that

the algorithm is exploiting to achieve relatively low-degradation compression. Since

the FIF model is based on representing complex fractal waveforms with simple

systems, it can be concluded that model/inverse algorithm works by exploiting some

of the fractal redundancy present in the original signals. This conclusion is confirmed

by considering that in Chapter 4 it was shown that both wind noise and the industrial

roomtone are examples of 1/f noises which are necessarily statistically self-affine

signals. The success of the modified algorithm is therefore a satisfying result since it

both confirms that these signals do have fractal properties and demonstrates that these

properties can be exploited by a fractal model.

The inverse algorithm, however, has not been designed to be computationally

efficient, nor does it provide optimal results. Since the algorithm only searches part of

the space of possible solutions, there is still room for improving the performance of

the algorithm. This could be achieved with a complete search which, however, would

be very computationally intensive, or by using a global optimisation technique, such

as a genetic algorithm. Also, no attempt has been made to reimplement or modify

Mazel's other more complicated models/algorithms. There is therefore potential for

further work on FIF based models for sound and the results presented in this chapter

have indicated the potential of pursuing this line of research.

131

Chapter 7

Chaotic Predictive Modelling

The previous two chapters have concentrated on the problem of modelling a sound

by representing the graph of its time domain waveform with a strange attractor. Recall

from Chapter 4 that the other suggested approach is to represent the dynamics of the

sound with a strange attractor. This approach is explored in this and the next chapter.

As has been established in Chapter 2, a sound is presumed to be represented with

digital audio and is therefore already in a form directly compatible with the FIF

models considered in Chapters 5 and 6 which represent the time series waveform. In

order to model the dynamics of the sound, however, it is necessary to develop a

suitable further representation. This will involve considering the relationship between

a chaotic system and a time series derived from it. The beginning of this chapter is

devoted to this issue and concludes by describing an analysis/synthesis model

requiring the solution of a specific inverse problem. The rest of this chapter then

focuses on an approach to this problem inspired by work on time series prediction.

The following chapter presents a related idea that is found to solve the roomtone

problem.

7.1. Chaotic Time SeriesThe founding assumption of this approach is that a chaotic system is responsible

for the sound that is to be modelled. Because the sound is in the form of digital audio

and therefore the model is to operate in the discrete time domain, it is convenient to

begin with a discrete dynamical system defined by a mapping:

nn xFx 1

x

n

d

X R

Z

(7.1)

or

0xFx nn (7.2)

where x is the d-dimensional state vector in state space X, n is discrete time, F is an

invertible nonlinear mapping, and let x0 be the initial condition of the system. Assume

that this system possesses a strange attractor, A, and an associated physical measure .

Recall that the attractor represents the long term dynamical behaviour of the system as

132

it is the set on which any typical trajectory of the system will lie, after transients. That

is,

x A n

An

for sufficiently large

X (7.3)

For the rest of this chapter, n is considered to be any time value sufficiently large for

transients to have passed. Because of the unpredictable nature of chaos, the state can

be interpreted as a random vector. The physical measure then describes the

probabilistic distribution of states on the attractor. Let this system be known as the

original system and denote it by

=(x ,X,F,A,) (7.4)

that is,

system=(state vector, state space, system mapping, attractor, associated measure)

In general, consider that not much will be known about this system as the state and

the mapping are not directly accessible. It is therefore not possible to construct,

directly, a physical model for the sound. Instead, assume the only source of

information available is the sound in the form of a digital audio time series. Consider

that this is generated by observing the original system through an observation

function,

RR

d

nn

o

xou

:(7.5)

where u is the time series and o is the observation function. The observation function

could represent, for example, the process of monitoring a sound field at a single point

in space with a microphone.

The interpretation of the state as a random vector implies that the observations

may themselves be viewed as random variables. Consequently, the time series, u, may

be interpreted as a realisation of a stochastic process,

ZnU n , (7.6)

Because a natural measure is assumed, recall that this has been defined as one that is

invariant under the mapping F. The distribution of states at some time n is then the

same as that at time n+1 and so the distribution of the random variable Un is the same

as that of Un1. Hence the stochastic process will be stationary [tayl91].

To summarise so far: the above provides a general model for how a digital audio

time series results from recording, or observing, a chaotic system. The observing

process is summarised by the function o and the result, u, may be viewed as the

realisation of a stationary stochastic process. The objective, however, is to model the

133

sound by representing its dynamics. Ideally, this would involve modelling the attractor

and associated measure of the original system. Since the original system is not directly

accessible, however, it is necessary to reverse the observing process and gain

information about the original system from the observed time series. This may be

done with a technique known as embedding.

7.2. EmbeddingAn embedding is a mapping with certain properties that allows one dynamical

system to be mapped to another while preserving essential features in the process

[take81, broo91]. In particular, such a mapping is continuous, differentiable and

invertible (a diffeomorphism). Consequently, each point in the original state space

maps to a unique point in the embedded state space, neighbouring points mapping to

neighbouring points, trajectories to trajectories, and therefore attractors map to

attractors. The properties of the embedding ensure that both the topology of the

attractor and the associated probability structure are preserved.

Define a (column) vector to be composed of m consecutive values of the time

series u,

Tmnnnnn uuuuy 121 ,, (7.7)

This is equivalent to a vector of observations of the state vector, x , and so with

Equations (7.2) and (7.5) may be rewritten as

Tn

mnnn

Tmnnnnn

xFoxFoxFoxo

xoxoxoxoy

121

121

,,

,,

(7.8)

and therefore this may be viewed as a mapping of the state vector,

nn xHy (7.9)

Generally, the mapping, H, is itself an embedding when [take81]

m d 2 1 (7.10)

It will therefore map the inaccessible states of the original system onto accessible

states of what is known as the embedded system. Define another mapping, G, that

comprises a shift on a sequence of observations as viewed through an m-length

register. That is,

2111 ,, mnnnmnnn uuuuuuG (7.11)

which, again, may be rewritten as,

Tnm

nn

T

nm

nn xFoxoxFoxFoxFoxoG 211 ,, (7.12)

and therefore,

134

nn xFHxHG (7.13)

This shows that the evolution of the original state under the mapping F is

equivalent to the evolution of the embedded state under the shift mapping G [tayl91].

An equation can therefore be written for the embedded system that relates consecutive

states with G,

nn yGy 1 (7.14)

Write the embedded system in full as

=( y , Y, G, B, ) (7.15)

where,

mH RXY (7.16)

is the embedded state space and m the embedding dimension. This system has an

attractor

AHB (7.17)

and also the probability of the embedded state being in some subset, b, of the attractor

is given by

Bb

bHb

1

(7.18)

In words: the probability of finding the embedded state in some subset, b, of the

embedded attractor may be found by mapping the subset back into the original system

with the inverse of the embedding mapping and then finding the probability of the

original state being in this mapped subset.

If the embedded system is itself observed by a projection of one coordinate of the

embedded vector, the original observed time series will result. That is,

nn ypu (7.19)

where

Tmyyyy

yyp

,, 21

1

(7.20)

is the projection function.

To summarise, the embedding procedure enables the reconstruction of the original

system by mapping inaccessible states and trajectories onto an accessible embedded

state space. The original and embedded systems may be viewed as equivalent systems

that generate exactly the same time series via different observation functions. The

embedding procedure may therefore be seen as a means of representing the dynamics

of a sound from its digital audio time series with an attractor and measure in

135

embedded state space. Given that the time series may be viewed as the realisation of a

stochastic process, the embedding procedure creates a representation that preserves

this viewpoint. To see this, consider the mth order joint probability density function

(jpdf)

11, mnnnU uuuP (7.21)that partly describes the stationary process, nU .This may be approximated with an

m-dimensional histogram of the occurrences of the m-tuples,

11 ,,, mnnn uuu (7.22)

This is exactly the same procedure as would be necessary to approximate the

measure of the embedded system, , by accumulating occurrences of the m-

dimensional embedding vectors. The jpdf and the embedded measure are therefore

equivalent descriptions of the time series as a stationary stochastic process.

Having established a suitable representation for the dynamics of a sound with an

embedded attractor and measure, the objective is now to fit this within a possible

analysis/synthesis model for the sound.

7.3. The Analysis/Synthesis Model

Figure 7.1 shows the proposed analysis/synthesis model which may be compared

with the general sound model shown in Figure 2.2. The analysis involves embedding

the original time series to form an accessible representation of the dynamics of the

original system in the form of the embedded attractor and measure. From this is

constructed a synthetic system, also in embedded state space, that has a similar

attractor and measure to that of the embedded system. Synthesis then consists of

iterating the synthetic system and observing it, via the projection function, to generate

a synthetic time series. Crucial to the analysis procedure, and which is described as the

inverse problem, is the construction of the synthetic system. Since the embedded

system is defined by the mapping G, the inverse problem may be recast as finding a

similar mapping for the synthetic system - let this be denoted G~

. A trivial solution to

the inverse problem is to make GG ~

. The synthetic system would then be expected

to exactly model the original time series. Equation (7.11) shows, however, that G is

defined by the original time series and so the problem becomes: represent a time series

with a mapping that is defined by that time series. This is a trivial problem and clearly

of no use.

136

originalsystem

embeddedsystem system

synthetictimeseries

original synthetictimeseries

observeinverseproblemembedobserve

identicalif

statisticallysimilarif

o pHu v

=

analysis synthesis

Figure 7.1 The proposed analysis/synthesis model based upon the embeddedattractor and measure representation of a sound time series.

A more realistic and useful inverse problem is: find a G~

that is an approximation

to G given only a finite sequence of the original time series. Preferably, G~

should be

as simple as possible, and yet define a system whose attractor and measure adequately

match those of the embedded system. By adequate, it is meant that the model

preserves qualities of the sound so as to maintain perceptual similarity between the

original and synthetic versions. Note that no attempt is being made to exactly model

the original time series in this case, only the dynamics of the system responsible for it.

Consequently, a typical sequence of iterates of the synthetic system will not match

those of the embedded system, but will lie on a similar attractor.

More formally, let the synthetic system be,

=(z , Y, ~,~,

~BG ) (7.23)

i.e.

nn zGz~

1 (7.24)

and let this system produce a time series via the projection observation function,

nn zpv (7.25)

which again may be viewed as the realisation of a stationary stochastic process,

nV (7.26)

with an mth order jpdf

11 ,,, mnnnV vvvP (7.27)

The inverse problem is to find a G~

such that

GG ~

(7.28)

137

and then

~and~

BB (7.29)

and it follows that

UV PP (7.30)

This shows that a solution of the inverse problem would allow the original time

series to be modelled not exactly, but such that the synthetic time series is statistically

similar to the original. More precisely, they will share similar mth order jpdfs.

Whether or not this is adequate to preserve the perceived qualities of the sound must

be determined by experimental investigation.

Because of the shifting nature of G, see Equation (7.11), and because it is a

deterministic function of the embedded state vector, see Equation (7.14), it may be

rewritten as,

TmnnnT

mnnn uuyguuuG 211 ,, (7.31)

where,

RR mg : (7.32)

is a vector to scalar function. The inverse problem therefore reduces to approximating

g, i.e. finding %g . To see how this may be done, consider embedding the original time

series and forming data pairs comprising an embedded vector at time n and the value

of the time series at time n+1,

1, nn uy (7.33)

These data pairs satisfy

1 nn uyg (7.34)

by definition - see Equation (7.31). Assume that a sequence of N samples are taken

from the original time series (after transients) and are available for embedding. Let

these be written,

1

0

Ni

iiu (7.35)

These may also be formed into a set of data pairs,

2

11,

Ni

miii uy (7.36)

Note that only N-m pairs can be formed from N samples. An approximation to g may

be found with a function that is satisfied by only the N-m data pairs:

21~1 Nmiuyg ii (7.37)

which is a problem of function interpolation. This provides a basis for finding g~ by

specifying its value at particular places, but not elsewhere. This condition may

therefore be relaxed slightly by replacing the equality in (7.37) to give

138

1~

ii uyg (7.38)

and still maintaining that

gg ~(7.39)

To summarise, this section has proposed an original analysis/synthesis model for

sound based on an embedded attractor/measure representation of a digital audio time

series. This model relies on the solution of an inverse problem which reduces to one

of function interpolation/approximation.

7.4. The Inverse Problem

Several strategies for the solution of a similar problem have been proposed in

work concerned with the accurate short-term prediction of time series believed to have

come from chaotic systems [broo91, farm87, casd89, tayl91, sing92]. The sound

model proposed is in fact an extension of this work, being based on ideas from it, but

having a different purpose. In the prediction work the emphasis is on finding a

function, known as the forecast, or prediction, function, which is of the same form as

Equation (7.37) and which enables as accurate a prediction of a given time series as

possible. As a consequence of this, the strategies used are not ideally suited for use in

the proposed sound model. The following is a discussion of the reasons for this and

concludes with the need for a different approach which is then developed in the next

section.

Firstly, the strategies are most concerned with accurate prediction which is

equivalent to recreating a given sequence of the original time series as closely as

possible. Recall that the intention is for the sound model to capture the form of the

dynamics of a given time series, and not recreate it exactly. As a consequence of the

emphasis on accuracy, there is little interest in producing a function that is simple to

describe. For the sound model, a different situation is sought: a function that is as

simple as possible, to allow the model to be conveniently parameterised, but that can

adequately capture the perceptually important properties of the sound. This is not to

say that the function sought should not be accurate at forecasting, but that there is a

trade-off between accuracy and simplicity that will be determined by the different

requirements of the sound model. Also, when the concern is for short-term

forecasting, there is no consideration of the computational cost of iterating the

forecasting function, as this will only be done a relatively small number of times - up

to the order of 100. For the generation of a sound, however, the synthetic system will

139

be expected to produce ~50,000 values for each second of output. There is therefore a

difference of several orders of magnitude between the number of iterations required.

Finally, this leads to another consideration, that of stability. There is no concern in the

forecasting work for whether iteration of the forecasting function produces a stable

chaotic system.

There are two forms of prediction function found to have been used in the

literature: global and local. The global functions are single nonlinear functions, such

as polynomials or radial basis functions, which are defined over the whole embedded

state space. The local functions are piece-wise linear, or low-order polynomials whose

domains are defined according to a nearest neighbour criteria. That is, to predict the

future behaviour of any given vector in embedded state space, a number of nearest

neighbours are found from the embedded sequence, a linear or low-order polynomial

is fitted to satisfy Equation (7.38), and the resulting function used to predict the next

vector.

Global nonlinear prediction functions are known to be much more costly to

compute, especially for large values of N and m, than their piece-wise linear

counterparts [casd89]. Also, the piece-wise linear functions have strong similarities to

affine IFS. Recall the intention presented in Chapter 4 to concentrate on IFS as they

are a well understood means of manipulating strange attractors (this will be discussed

again in the Further Work section later in this chapter). For these reasons, piece-wise

linear functions are a preferable choice. As mentioned, however, the natural way in

which they are used in the prediction work involves calculating the domain of each

linear section with a nearest neighbour method. This is a computationally intensive

process [broo91] that is required once per iteration of the prediction function, the

complete prediction function not being calculated once in advance, as in the proposed

sound model. It is not obvious how this technique could be adapted so as to achieve

this.

What is therefore required is a new solution to the inverse problem that is more

suitable for use with the proposed sound model. Preferably this would result in thespecification of an easily implemented, piece-wise linear function to be used as g~ in

the synthetic system. The next section presents such a solution and is followed by a

number of experimental results that reveal some of the capabilities of the resulting

sound model.

140

7.5. A Solution to the Inverse Problem

The inverse problem is now: given the set of data pairs in Equation (7.36), find apiece-wise linear function, g~ , that satisfies Equation (7.38). The full specification of

the piece-wise linear function requires two parts, the partition of embedded state space

that defines a set of disjoint domains for the individual linear functions, and the linear

functions themselves. Let the partition into domains be a set of subsets of the

embedded state space, Qj

jjD

1 (7.40)

such that

Qj

j jD

1Y (7.41)

and

jjDD jj (7.42)

For each domain, define a linear function,

j

m

mjjj

jjj

b

y

y

yaaa

byayl

2

121

(7.43)

Then define,

jj Dyylyg when~(7.44)

i.e. g~ being one of the linear functions depending on which domain the state is in.

The problem may therefore be split into two halves: firstly, construct the partition

and secondly, fit a linear function within each domain to those data pairs whose

embedded vector components fall within that domain. Each linear function in

Equation (7.43) must be made so as to best satisfy Equation (7.38) for those

embedded vectors contained within each domain. Since Equation (7.43) describes an

m-dimensional hyperplane, it can exactly interpolate m+1 data pairs or best fit a

greater number. The partition, therefore, must divide the set of data pairs in Equation

(7.36) so that there are at least m+1 in each domain.

The scheme created achieves this task by recursively dividing the set of data pairs

into two subsets containing approximately equal numbers. This is done until any

further subdivision would reduce the number of data pairs in a set to less than a

prespecified minimum, made to be at least m+1. Each successive division is with

141

respect to a different coordinate of the embedded state space, the coordinate

incrementing with each level of recursion. This generates an m-dimensional search

tree of the type used in multidimensional range searching [sedg83] and effectively

divides the embedded state space into a set of Q hypercuboid domains. In more detail:

The set of data pairs,

2

11,

Ni

miii uy (7.45)

are divided into two subsets according two,

1

11

11

11

110

:,and

:,

cyuyS

cyuyS

iii

iii

(7.46)

where yi1 denotes the first component of yi , and c is chosen to make the number of

points in the two sets as close as possible,

10 ## SS (7.47)

This effectively partitions Y with the hyperplane,

y c111 (7.48)

This process is then repeated at the next level of recursion so that S0 is divided into,

2

12

0101

21

20100

:,and

:,

cySuyS

cySuyS

iii

iii

(7.49)

and S1 into

2

22

1111

22

21110

:,and

:,

cySuyS

cySuyS

iii

iii

(7.50)

The recursion continues until any further subdivision would violate

# S M mj 1 (7.51)

where M is the prespecified minimum number of points per domain.

Note that at each level of recursion, the division is with respect to a different

component of yi , the index of which increments and wraps around to 1 after m.

The resulting set of c's then form both the boundary values of the hypercuboiddomains, D j , and a search tree which is used to determine the domain in which any

vector in embedded state space is located. A simple example of the form of partition

resulting from this process is shown in Figure 7.2 which is of a m=2 dimensional

embedded state space.

142

c1

1

c2

2

c2

1

c1

2c1

4

c1

3c1

5

c2

3 c2

4

c2

5 c2

6

c2

7

c2

8

c2

9

c2

10

range ofembeddedvectors

domains

c1

1

c2

1c2

2

c1

2c1

3c1

4c1

5

c2

3c2

4c2

5c2

6c2

7c2

8c2

9c2

10

>

Figure 7.2 Left, an example recursive partition for m=2 and right, the associatedsearch tree.

A fitting error for each domain, j, can then be written as,

2:

1

ji Dyi

iijj uyle(7.52)

which is the sum, over all the data pairs contained within the jth domain, of the

squared difference between the value according to the hyperplane being fitted and the

actual value given by the data pair. The best fit is therefore obtained by minimising

this error function with respect to the mapping parameters which is a standard linear

least-squares problem [nag91].

7.6. Experimental Technique

The program implemented to test the proposed sound model consists of the

following steps:

Analysis

Input the original time series u.

Embed the time series to form the set of data pairs 2

11,

Ni

miii uy .

Recursively subdivide the data pairs to form sets S sorted into domains D that

partition the embedded state space. Also formed is the search tree.

Fit within each domain D a linear function, l, to the data pairs S. Each linear

function is parameterised by a and b. The fit uses least squares to minimise the error

e.

143

The search tree and the set of l then define the function g~ and hence the

mapping G~

.

Synthesis

Initialise the synthetic system by setting z0 equal to one of the embedded

vectors.

Use the search tree to find which domain the synthetic vector z is located in.

Apply the linear mapping associated with that domain and hence calculate

nn zGz~

1 (7.53)

Observe the synthetic vector with the projection function to get the synthetic

time series,

nn zpv (7.54)

Iterate the above three steps as many times as is desired or is possible.

The variables that define the analysis are summarised as follows:

N - the length, in samples, of the original time series used for the analysis,

m - the embedding dimension,

M - the minimum number of points to result per domain after partitioning,

Q - the number of domains that form the partition - determined by N and M

and the nature of the data.

In order to experimentally evaluate the performance of this model, consider the

following criteria:

1) stability - does the synthetic system remain stable over the desired period of

iteration so that an output is generated?

2) predictability - since g~ is a forecasting function, how accurately can it make

predictions of the original time series?

3) attractor similarity - if the synthetic system is stable, does a resulting trajectory

lie on an attractor and have a distribution (the associated measure) that are similar to

the embedded ones?

4) time series similarity - how do the original and synthetic time series compare?

5) sound similarity - of greatest importance, does the synthetic time series, v, when

converted to audio, sound like the original?

144

6) model complexity - how complex are the analysis and synthesis processes and

how many parameters are required to specify the synthetic system?

The following discusses these criteria in greater detail and develops an

experimental approach for testing the sound model.

1) Stability. The stability of the system is determined by whether or not the

synthetic vector, z , remains bounded within a predefined range. Since the input and

output time series are represented as 16-bit integers, this provides a natural range. All

the original time series used do not fill the full dynamic range and therefore leave

some headroom for the synthetic system to operate within. If the synthetic time series

exceeds this range, then, the system is considered to have become unstable and the

synthesis process is terminated.

2) Predictability. The accuracy of the prediction function may be calculated by

comparing a portion of synthetic time series with that of the original. In the prediction

literature this is done as follows [farm87]. The original time series is split into two

pieces; the first is embedded and used for the analysis, the second is used for the error

calculation. This second piece is divided into I sequences of length L+m, one

sequence for each trial. A trial consists of initialising the synthetic system with the

first m values of the sequence and then iterating the system L times to produce

predictions for L times ahead. Let

Nnun 1: (7.55)

be the N samples of the original time series used for the analysis, and

Iiuu mLii 1:1 (7.56)

be the I trial sequences. Let

Timimii uuuz 110 , (7.57)

be the initial value of the synthetic vector for the ith trial. Then iterate

nini zGz~

1 (7.58)

L times and observe it via p to give the synthetic series

Iivv Lii 1:1 (7.59)

The prediction error as a function of the time ahead, j, is then defined as

21

21

1

2

1

2

1

1

N

nnn

I

ijimji

uuN

vuI

j

(7.60)

145

where

N

nnn u

Nu

1

1

(7.61)

is the mean of the portion of the original time series used for embedding. The

numerator of Equation (7.60) gives the r.m.s. prediction error averaged over the I

trials and the denominator is the variance of the portion of the original time series

used by the analysis. It can be seen that this denominator normalises the prediction

error so that =0 for perfect predictions and 1 if the predictions are constantly made

equal to the mean of the original.

3) Attractor similarity. As already explained, the long-term trajectory of the

synthetic system is not expected to be similar to that of the embedded system, but, if

the model is good, is expected to lie on a similar attractor and the states be similarly

distributed. The most immediate way of making a comparison is by visual inspection

of the phase portraits derived from the original and synthetic time series. Since a

phase portrait and an embedding are related through the method of taking delayed

versions of the time series, the portrait gives a direct view of the domain in which the

model is working. The phase portrait is considered to be sufficient to show if the

synthetic system possesses an attractor of similar form to the embedded one. A

quantitative comparison would involve calculating the 'closeness' of the embedded

and synthetic attractors/measures. Such a closeness function is provided by the

Haussdorf and Hutchinson metrics [barn88] which give distances between subsets and

their measures, respectively, of the embedded state space. So let

BBd~

,HA (7.62)

be the distance between the embedded and synthetic attractor and let

~,HUd (7.63)

be the distance between their associated measures. In practice, these distances must be

estimated from sample trajectories lying on the attractors. It is not known, presently,

how to do this, but instead consider the following. Recall from Equation (7.28) that

the assumption is made that if

GG ~

(7.64)

which is equivalent to

gg ~(7.65)

then

~and~

BB (7.66)

So, define another metric

146

ggd ~,map (7.67)

that quantifies the similarity between the embedded and synthetic mappings and

propose that minimising this will minimise both (7.62) and (7.63). Now, it can be seenfrom Equation (7.60) that (1) is an estimate of dmap as it is an average over I trials of

the difference between

1101 , uuugzgu imimimi (7.68)

and

1101 ,~~ uuugzgv imimii (7.69)

That is, it is the difference between the value of the embedded system function g, atsome point given by the trial vector, and the value of the synthetic approximation, g~ ,

at that same point. So, instead of directly calculating the closeness of the embedded

and synthetic attractors/measures to determine the accuracy of the model, it is

proposed that the prediction error for one time step ahead gives a measure of the

expected closeness.

4) Time series similarity. Also recall from Section 7.3 that if the synthetic

attractor/measure is close to the embedded one, then the synthetic time series should

be statistically similar to the original. i.e. it is expected that

UV PP (7.70)

Since these m-dimensional jpdfs are equivalent to the m-dimensional measures,

the same problem exists of not having a means to practically compare them. Instead,

however, it is possible to easily estimate the one-dimensional pdf by considering that,

because a natural measure is presumed, the processes are stationary and ergodic, and

so an estimate of the amplitude distributions of the processes is an estimate of the

pdfs. An estimate of the amplitude distribution is found by calculating a histogram of

the relative frequencies that the sample amplitudes take. That is, divide the amplitude

range of the time series into a number of bins and calculate the number of times the

sample amplitude falls within each bin divided by the total number of samples used.

5) Sound similarity. This necessarily requires a subjective comparison of both the

original and synthetic time series when converted into sound. As well as presenting a

report of my own opinion on the comparison, many of the sounds are included on the

accompanying cassette tape so that readers of this thesis may judge for themselves.

6) Model complexity. An assessment of the model complexity may be divided into

two aspects. Firstly, the computational complexity of the analysis and synthesis

processes that determine the time taken to generate the synthetic system and the time

taken to generate the synthetic time series. Secondly, the number of parameters

147

required to describe the synthetic system. This second aspect is of greater importance

to this work than the first since the emphasis is on determining whether or not simple

chaotic systems can represent complex sounds and not on the computational

efficiency of such techniques. The synthetic system is defined by the mapping %gwhich in turn is defined by the partition and the associated set of locally linear

functions. The number of parameters required to represent the partition is

approximately equal to the number of partition domains, Q. Each of the Q linear

maps, l i , is defined by an m-dimensional vector ai and a scalar bi . The total number of

parameters that define the mapping is therefore given by

parameters21~ QmQmQgP (7.71)

In practice, for the particular implementation of the model used in the experiments,

the partition is defined by a search tree of approximately 2Q nodes, each of which

requires 7 bytes of storage. The mapping parameters each use 8 bytes. The system

complexity in bytes is therefore given by,

bytes)228(1814~ QmmQQgB (7.72)

These two measures quantify the complexity of the synthetic system seen from the

view of user manageability (number of parameters) and computer storage (number of

bytes).

The following experiments divide into two parts. Firstly, some tests are carried out

on the model with an artificially generated original time series. The intention is to

confirm that the model performs in accordance with the theory and to gain insight into

the relationship between the model parameters and its behaviour under known

conditions. Secondly, the model is used with a number of actual sounds with the aim

of achieving the best possible performance according to all, but mostly the fifth,

criteria.

7.7. Experiments with a Lorenz Time SeriesIn the following experiments, the time series used as input to the analysis has been

generated with a numerical simulation of Lorenz's chaotic system. The method of

simulation and the system parameter values are those given in [bidl92]. The Lorenz

system is a set of three differential equations:

nnn

nnnn

nn

bzyxz

zxyRxy

zyx

(7.73)

which are numerically integrated using

148

tzzz

tyyy

txxx

nn

nn

nn

1

1

1

(7.74)

with parameter values

10 0 28 0 2 67 0 01. , . , . , .R b t (7.75)

which put the system into a chaotic regime. The time series derives from observing

one of the three state variables, in this case x, after any transients have decayed. This

provides a time series from a known, stationary, noise-free, low-dimensional (d=3),

numerical chaotic system with which to test the sound model. The results presented on

the following three pages have been generated by varying each of the three analysis

parameters, N, m, and Q, in turn while keeping the other two fixed. Each set is

presented on a separate page showing a phase portrait derived from the original time

series and a number of portraits of the synthetic time series. All phase portraits are

derived from 10,000 samples of their respective time series, where possible, and have

a delay value of 10 samples. Also shown are graphs of the prediction error for one

time step ahead against the variable parameter.

149

m

00.00010.00020.00030.00040.00050.00060.00070.00080.00090.001

0 2 4 6 8 10

(a) Original (b) Prediction error (1) against m

(c) m=2 (d) m=3

(e) m=4 (f) m=6

(g) m=7 (h) m=10

Figure 7.3 Lorenz input, N=10,000, Q=256 and a variety of embedding dimensions,m.

150

Q

00.00010.00020.00030.00040.00050.00060.00070.00080.00090.001

1 10 100 1000

(a) Original (b) Prediction error (1) against Q

(c) Q=16 (d) Q=32

(e) Q=64 (f) Q=128

(g) Q=256 (h) Q=512

Figure 7.4 Lorenz input, N=10,000, m=7, and a variety of number of domains, Q.

151

N

00.00010.00020.00030.00040.00050.00060.00070.00080.00090.001

1 10 100 1000 10000 100000

(a) Original (b) Prediction error (1) against N

(c) N=750 (d) N=1,000

(e) N=5,000 (f) N=10,000

(g) N=50,000 (h) N=75,000

Figure 7.5 Lorenz input, Q=64, m=7 and a variety of original time series lengths, N

152

The results presented in Figures 7.3 to 7.5 show three main features. Firstly, they

confirm that the proposed model in the form described is successful at modelling the

dynamics of a chaotic time series via the embedded attractor. Secondly, that the

performance of the model depends considerably on the values of the analysis

parameters, m, Q, and N. And thirdly, that the one step prediction error, (1), does

relate to the performance of the model.

By successful it is meant that the model is capable of creating a synthetic system

that generates a trajectory in state space lying on an attractor that is similar to the

original/embedded one. This can be seen by comparing the phase portrait derived

from the original time series with a number of those derived from the synthetic

system, for example 7.3(g) and 7.4(f), and seeing that they have the same overall

form, size and features. It is then expected that the similarity of the attractors

corresponds to statistically similar observed time series. Figure 7.6 shows portions of

the original time series and the synthetic one used to generate the phase portrait in

7.4(f). The synthetic system in this case was initialised with a vector derived from the

first seven (equal to the dimension of the embedding space) values of the original time

series. It can be seen that, as expected, the synthetic time series approximately

matches the original for the first few time steps, corresponding to accurate short-term

prediction, and then diverges rapidly, a consequence of sensitive dependence on initial

conditions. The overall form of the two waveforms, however, can be seen to be very

similar.

Figure 7.6 Time series plots from original Lorenz system (left) and the synthetic oneshown as phase portrait 7.4(f) (right).

To partially compare the statistics of the two time series, Figure 7.7 shows an

estimate of their pdfs in the form of histograms of their respective amplitude values.

These were calculated with time series of length 10,000 samples and 100 bins of equal

width. Again, it can be seen that the form of the graphs is close enough to conclude

that the model is working in generating a synthetic time series with

statistics similar to the original.

153

a

P(a)

0

0.02

0.04

0.06

-0.3 -0.1 0.1 0.3

a

P(a)

0

0.02

0.04

0.06

-0.3 -0.1 0.1 0.3

Figure 7.7 Estimates of amplitude probability distributions for original, left, andsynthetic, right, time series shown in Figure 7.6.

It can be seen from the results that the similarity of the phase portraits depends

considerably on the parameters that define the analysis. In particular, Figure 7.3 shows

there to be a large increase in similarity from the ill-formed result when m=2 to when

m=4. As m is further increased, there is not such a dramatic change in the result. This

behaviour can be seen to be paralleled by the value of the one-step prediction

error (1) where there is a large drop in its value between m=2 and m=4 and then little

change from m=6 onwards.

According to the theory presented earlier in this chapter, the embedding procedure

should only preserve the trajectories and attractor of the original system when the

embedding dimension m satisfies Equation (7.10),

12 dm (7.76)

Consequently it would be expected that there should be a significant change in

performance between the cases where m<7 and m 7since d=3 for the Lorenz system.

It is not known why, in this case, this does not occur and the model can be seen to

produce good results for m<7. The value of (1), however, does show to be at a

minimum for m=7.

Figure 7.4 shows that there is also an increase in similarity of the attractors as the

number of domains of the partition, Q, increases and that there is a point, in this case

approximately Q=128, where further increase results in no appreciable improvement.

This relationship is again paralleled by the variation in (1). Note, however, that there

is not an exact relationship between the accuracy of the model and the value of (1) as

can be seen by comparing Figures 7.3(e) and 7.4(c) which have similar corresponding

prediction error values of ~2.5e-4 and yet differ in their similarity to the original

attractor.

154

The third set of results shown in Figure 7.5 reveal another similar trend. This time,

the quality of the results improve with an increase in the length of the original time

series, N, and again this is strongly related to the value of (1). The results shown in

(c) and (d) are examples of unstable synthesis. The number of iterations before the

output time series goes out of range are 182 and 2368 respectively. The phase portraits

show clearly the trajectory leaving the site of the intended attractor. Note how these

unstable systems correspond to prediction errors that are considerably higher than the

other stable ones. Finally, notice that the best performance for this set, shown in (h),

corresponds to the lowest prediction error.

To conclude on these results, it appears that there are definite relationships

between the performance of the model and the analysis parameters. These can be seen

by visual inspection of the phase portraits and by the value of the one-step prediction

error. The exact nature of these relationships, however, is not known. For the Lorenz

case it appears that performance can be maximised by maximising the length of the

original time series, N, and the number of partition domains Q while optimising the

embedding dimension according to the state space dimension of the original system

and Equation (7.10). The objective of the model is to maximise the performance so

that it is perceptually acceptable while minimising the complexity of the synthetic

system mapping. The above relationships, then, should be considered along with the

fact that the number of parameters required to specify the synthetic system mapping, g~ , is proportional to both m and Q, but not N.

Finally, to be specific about the synthetic system complexity, the example used so

far as the successful synthetic system whose phase portrait is shown in Figure 7.4(f)

is described by

.parameters1152128272 QmP (7.77)

and correspondingly

bytes9984228 QmB (7.78)

155

7.8. Experiments with Sound Time Series

This section presents results generated with the sound model using a test set of six

different sounds as input. The six sounds divide into three pairs of similar type such

that one of the pair is a more complex example of that type than the other. The three

types are: air noises, gong sounds, and musical tones. This test set has been chosen

because the sounds are the product of what are, or are believed to be, nonlinear

dynamical systems. This choice has been made on the basis of the physical nature of

the system, the type of behaviour of the sound time series, and the knowledge of

subjects discussed in Chapter 4. To summarise: the air noises are products of turbulent

fluid systems, one of which is wind noise and known to have fractal properties; gongs

are nonlinear systems exhibiting properties associated with chaos whose sound

waveforms are irregular and complex; and the musical tones are examples of

nonlinear systems exhibiting limit cycle behaviour. Each sound will be discussed in

greater detail in the forthcoming subsections. Note also, that all sounds have been

represented with 16-bit digital audio using a sampling rate of 48kHz.

As mentioned in the section on experimental technique, the intention of the work

described in this section is to ascertain whether the sound model works for real sound

time series having confirmed that it works for a synthetic chaotic signal as described

above. The aim is therefore to find as much useful evidence as possible that will allow

a conclusion to be drawn and to give insight into the nature of the model for possible

future work.

7.8.1. Air Noises

The two air noises are described as 'fan rumble' and 'wind noise'. The first of these

was created by fixing a microphone in a constant stream of air produced by a small

ventilation fan. This causes a low frequency, irregular rumble to be induced within the

head of the microphone which can be monitored through a microphone amplifier. The

microphone used was a Sony electret condenser type ECM-979. Varying the position

of the microphone transversely to the direction of the air stream varies the quality of

the sound produced. When the microphone is on the edge of the airstream, only a

quiet, high frequency hiss can be heard. At a certain point moving towards the centre

of the air stream, a louder, deep, irregular rumble also starts to appear. The severity

and volatility of this sound increases as the microphone is moved into the centre of the

air stream. The rumble is due to turbulence in the air stream as it passes through the

head of the microphone. Furthermore, it is believed that the point at which the rumble

occurs corresponds to weak turbulence and therefore low-order chaotic dynamics.

156

This idea is supported by the fact that several systems of fluid flow display low-order

chaotic dynamics during the onset of fully turbulent behaviour. That is, as some

controlling parameter is increased, for example the speed of fluid flow, the dynamics

of the system follow a bifurcating sequence which includes low-dimensional chaos

before becoming fully turbulent [crut86] and [goll75].

Several seconds of the fan rumble sound were sampled at a rate of 48kHz and low

pass filtered to remove the high frequency hiss as well as other extraneous noise

entering the microphone so as to leave just the low frequency rumble. The cut-off

frequency of this filter was approximately 3kHz. This has then been processed with

the sound model with a variety of analysis parameters. Table 7.1 shows a summary of

the resulting prediction errors and descriptions of the output time series for a selection

of analysis parameters.

experimentidentification

length oforiginal

timeseries

N

embeddingdimension

m

minimumnumber ofpoints per

domainM

resultingnumber

ofdomains

Q

predictionerror(1)

comments on synthetictime series

rc159 5,000 5 10 258 0.0056 unstable after 194iterations

rc158 25,000 10 20 1020 0.0015 unstable after 7457iterations

rc157 12,000 20 40 255 0.0014 irregular modulation ofsinusoidal oscillation

rc156 25,000 20 40 512 0.0013 slight irregularmodulation of

sinusoidal oscillationrc127 50,000 20 40 1021 0.0018 best result - similar to

originalrc132 100,000 20 40 2047 0.0013 as good as above, but no

betterrc155 50,000 5 80 512 0.0012 transient leading to

nearly periodicoscillation

rc129 50,000 10 80 512 0.0014 as above, but with verylong transient

rc126 50,000 20 80 512 0.0012 similar, but loweramplitude version of

originalrc130 50,000 30 80 512 0.0013 as above

rc131 50,000 40 80 512 0.0014 as above

rc128 50,000 20 160 256 0.0011 as above

Table 7.1 Summary of results using fan rumble sound as input to the dynamic

model.

157

Time series plots and phase portraits for both the original and the best synthetic

time series result, rc127, are shown in Figure 7.8. The mapping complexity for the

synthetic system is, in this case,

parameters462,221021.222 QmP (7.79)

which corresponds to

kbytes186228 QmB (7.80)

Figure 7.8 Time series plots and phase portraits for: left, original fan rumblesound and right, best synthetic output, rc127.

As can be seen, there is a strong similarity between the two sounds when viewed

in these domains. The similarity is not as strong as that between the original and

synthetic Lorenz time series, but does show that some of the dynamic characteristics

of the original are being captured. This result, however, reveals itself to be very good

when the two time series are compared as sounds. A 3 second length (~150,000

iterations) of the synthetic time series rc127 has been generated for this purpose. This

and the original sound can be heard as Sound examples 21 and 22. The synthetic

version captures many of the fundamental perceived qualities of the original, such that

overall quality of the two is nearly indistinguishable. The difference that does exist is

that the original sounds slightly more 'boomy' than the synthetic version.

158

This is considered to be a very significant result for a number of reasons. Firstly, it

is a demonstration that the model can work for a real chaotic time series, and not only

for a synthetic one as in the previous section. Secondly, it shows that dynamic

modelling with a synthetic system whose strange attractor approximates that of the

embedded original can capture the perceived characteristics of a sound. I believe this

to be both an original idea and the first demonstration that it is possible.

Figure 7.9 shows some plots of the other results to show the behaviour resulting

with different analysis parameters. In general, it appears that these behaviours are

associated with state space trajectories that are 'stuck' in subsets of the embedded

original attractor, or at least unevenly distributed over it. For example, the result of

experiment rc126 shows, after a small transient, the trajectory settling to an attractor

that is similar to the inner part of the embedded original. The same applies to the

result of rc156, but the trajectory lies mostly on the outermost band of the original

embedded attractor. The result of rc157 shows an uneven distribution of the trajectory

where it stays at the edge and mid-part of the attractor much more than it does in the

middle. Note, however, the way in which this result captures the outermost parts of

the orbit, or spikes as they appear in the time plot, that exist in the original.

This tendency for the trajectory to be caught in regions of embedded state space

for longer periods than does the original, and the consequent unevenness of

distribution sets these results apart from that of rc127, the best result. These results,

however, also highlight what is different between rc127 and the original: a less

exaggerated version of the same thing. Inspection of the time series plots in Figure 7.8

show the same tendency for the output to change between different parts of the

attractor at a slower rate than the original, which is, by comparison, modulating with

greater volatility.

Note, however, that with these experiments, the one step prediction error, (1),

does not have such a strong relationship to the performance as it does with the Lorenz

case. This can be seen in Table 7.1 where although the error for experiment rc159 is

highest by a factor of 3 or 4 and is the most unstable, there is not much difference

between all the other errors. That is, there is no reliable pattern to the error values that

distinguish between the occurrences of different types of behaviour.

159

rc126b rc129b

rc156b rc157b

Figure 7.9 Time series plots and phase portraits for some more outputs from thesound model using the fan rumble as input. Note that only about a third the length ofthe output appears in the phase portraits as it does in the time series plots for the sakeof clarity.

160

The second air noise to be used is the sound of the wind which has already been

discussed in Chapter 4 and shown to have a 1/f power spectrum. This is used in

contrast to the fan rumble signal as it is an example of strong turbulence and therefore

expected not to be low-dimensional chaos, but an example of something else, possibly

high-dimensional chaos. For discussions on the possible relationships between

turbulence, high-dimensional chaos and 1/f noise see [casd92], [bak91] and [mann80].

A portion of the wind noise has been chosen from a ten second recording which is

judged to be as steady-state in form as possible. This limits the resulting portion to

about one second (48,000 samples) for processing by the model. Figure 7.10 shows

time series plots, phase portraits and power spectra for the wind noise and

for the best of the results found covering a similar range of parameters as in the

previous experiment. In this case the analysis parameters were: N=46,000 m=20

Q=1008 resulting in synthetic system complexities of

P B 22 000 183, parameters and kbytes.

As can be seen, a strong similarity is again apparent between the original and

synthetic time series. In this case, the similarity is particularly strong between the

power spectra of the two - the synthetic version having the same 1/f structure. In the

audio domain, the result is slightly disappointing given these strong similarities.

Although the synthetic version possesses elements of the original, including the same

'roaring' quality, it is more of a 'flapping' sound. The two sounds can be heard as

Sound examples 23 and 24.

161

Figure 7.10 Time series plots (first fifth of top plot shown magnified as secondplot), power spectra and phase portraits for original wind noise, left, and syntheticversion, right.

162

7.8.2. Gong Sounds

Figures 7.11 and 7.12 show time series plots, phase portraits and amplitude

histograms for the best results found for two gong sound inputs. The first is of a gently

struck gong, and the second of a hard strike. The analysis parameters and resulting

model complexities are shown in Table 7.2. For both results it can be seen that the

trajectories in state space for the synthetic systems approximately match those of the

originals, but that the time series themselves appear different. In particular, the long

term structure, which for the gently struck gong is present as a strong fundamental

periodic component, is not preserved in the synthetic time series. This results in the

synthetic versions sounding quite unlike the originals. The original softly-struck gong

can be heard as Sound 25 and the synthetic version as Sound 26. The original and

synthetic versions of the hard-strike gong sound can be heard as Sounds 27 and 28

respectively. Despite this, it can be seen that the amplitude histograms of the synthetic

time series show to have the same overall form as those of the originals.

Note the difference in models used for these two examples. In the lightly-struck

case, a high model order has been used with a relatively low number of partition

domains. For the hard strike the opposite is the case: a very low-order model, with a

relatively high number of partition domains.

description ofsound

length oforiginal

time seriesN

embeddingdimension

m

numberof

domainsQ

predictionerror(1)

syntheticsystem

complexities P and B

lightly struck

gong

20,000 30 241 0.0028 7,712 params

63 kbytes

hard strike

gong

10,000 3 1024 0.0063 5,120 params

47 kbytes

Table 7.2 Summary of analysis parameters for best results using gong sounds.

163

amplitude, a

P(a)

0

0.005

0.01

0.015

0.02

0.025

-0.06 -0.04 -0.02 0 0.02 0.04 0.06

amplitude, a

P(a)

0

0.005

0.01

0.015

0.02

0.025

-0.06 -0.04 -0.02 0 0.02 0.04 0.06

Figure 7.11 Time series plots, phase portraits and amplitude histograms fororiginal, left, and synthetic, right, lightly-struck gong sound. Both amplitudehistograms were computed with 10,000 samples and 100 bins.

164

amplitude, a

P(a)

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

-0.1 -0.05 0 0.05 0.1

amplitude, a

P(a)

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

-0.1 -0.05 0 0.05 0.1

Figure 7.12 Time series plots, phase portraits and amplitude histograms fororiginal, left, and synthetic, right, hard-strike gong sound. Both amplitude histogramswere computed with 10,000 samples and 100 bins.

7.8.3. Musical Tones

The two musical tones chosen for analysis were recorded from a tuba and a

saxophone. Both extracts of the tones were devised to be as constant as possible,

avoiding any transient qualities or time-varying effects such as vibrato. These are

examples of nonlinear systems providing a periodic excitation to a resonator and

therefore are expected to exhibit limit cycles in their state spaces. The difference

between the two is that the tuba tone is much purer tone than that of the saxophone,

having a less complex spectral structure.

The original tuba waveform shows to be very nearly a regular sinusoid. The

spectra of this signal, however, shows the presence of a number of harmonics and the

165

phase portrait reveals the slight irregularity in amplitude causing a thickening of the

closed-loop limit cycle. These can be seen in Figure 7.13. Also shown is the best

synthetic version found, which is very close to the original. This similarity is also

preserved for the perceived sound which can be heard as Sound 29. The original can

be heard as Sound 30. As well as the result for the tuba sound being an accurate one, it

was also found to be possible with relatively simple models. In this case, only a four-

dimensional embedding and 32 partition domains were used. This results in synthetic

system complexity that is lower by several orders of magnitude compared to many of

the other sounds used. Full details of the analysis parameters for this and the next

experiment are shown in Table 7.Error! Bookmark not defined..

description ofsound

length oforiginal

time seriesN

embeddingdimension

m

numberof

domainsQ

predictionerror(1)

syntheticsystem

complexities P and B

tuba 8,000 4 32 0.00032 192 params1,728 bytes

saxophone 60,000 20 1024 0.0047 22.5 kparams186 kbytes

Table 7.3 Analysis details for the musical tones.

Figure 7.14 shows the time series plots and phase portraits for a saxophone tone

and the best synthetic version using the dynamic model. These can be heard as Sounds

31 and 32 respectively. Note the difference in complexity between this and the tuba

tone which can be seen in both the time series plots and phase portraits. The topology

of the trajectory in state space is the same as that for the tuba: a thickened closed loop.

In this case, however, the loop is tangled in state space due to the presence of strong

harmonics. The synthetic version is not as close to the original as it is for the tuba

tone. A particular fault, which also occurs with the gently struck gong, but not for the

tuba, is that the synthetic time series loses the essential periodic structure of the

original. This can be understood by considering the state space trajectories. For the

saxophone, parts of the trajectory pass close to one another and are therefore likely to

be confused by the analysis when the state space is divided into partition domains.

Consequently, synthetic trajectories cross over from one part of the attractor to another

due to inaccurate predictions. As a result, the trajectory stays close to the original

attractor, but does not cover parts of the loop in the same sequence as the original.

166

Figure 7.13 Time series plots, power spectra and phase portraits for original, leftand synthetic, right, tuba tones.

Figure 7.14 Time series and phase portraits for original, left, and synthetic, right,saxophone tones.

167

7.9. Conclusions

In this chapter, an original nonlinear dynamical analysis/synthesis model has been

proposed, implemented and tested with a number of sound time series. The overall

conclusion is that the results are good enough to confirm the feasibility of this

approach and to warrant further investigation. The main results are demonstrations:

1) of the ability of the model to recreate the Lorenz attractor to a high degree of

accuracy with a synthetic system;

2) that the modelling of an attractor can be sufficient to preserve the perceived

characteristics of sound, especially the irregular fan rumble sound and the regular tuba

tone;

3) that other sounds can be partially modelled such that the spectrum or amplitude

pdf is preserved although the perceived sound is not so well preserved;

4) that the one-step prediction error relates to the similarity of the synthetic and

original attractors for the Lorenz case;

5) that the relative one-step prediction errors relate well to the relative

performance with different time series (see below).

The performance of the model has been evaluated by both qualitative and

quantitative means. The former includes subjective comparisons of the perceived

sounds as well as visual inspection of the time series plots, phase portraits, power

spectra and amplitude histograms. From these it can be concluded that the model is

preserving characteristics of the original time series. Each sound has been analysed

using a similar range of analysis parameters with the aim of maximising the

performance of the model using the qualitative criteria. There is some variety to the

degree of success depending on the source of the time series. This relative, qualitative

assessment is paralleled quantitatively by the values of the one-step prediction errors.

This can be seen by inspection of Figure 7.15 which orders the best prediction errors

obtained for each sound as: Lorenz, tuba tone, fan rumble sound and the worst as

hard-strike gong. This ranking agrees with what was found qualitatively.

It was found that the analysis parameters have a strong relationship to the

performance of the model. In general it was found that increasing the order of the

model, m, the number of partition domains, Q, and the length of the original time

series, N, improves performance, but that there is a limit to the best performance

depending on the time series used. This limit is also paralleled by a floor on the value

of the prediction errors which tend to be approximately the same for the set of best

results for any one time series.

168

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

lorenz tuba fanrumble

gong(gentle)

wind sax-ophone

gong(hard)

0.00018

0.0018

0.0028 0.0029

0.0047

0.0063

0.00032

Figure 7.15 Relative one-step prediction errors for the best results found for eachof the time series.

There are a number of possible reasons for the limits on performance of this

particular model. The following are four main suggestions. Firstly, that the time series

used is not produced by a low-order chaotic system, or one that has a low-order

attractor in its state space. Since no attempt has been made to diagnose low-order

chaos in the time series before modelling, it is not possible to confirm this. Instead,

the degree of the performance of the model itself may be viewed as diagnostic

evidence for chaos.

Secondly, that the excerpts of time series used do not correspond to the steady-

state behaviour of a system that has settled, after transients, to an attractor. This is

most likely for the gong sounds which are intrinsically transient. Their long term

behaviour is actually a rest state corresponding to a fixed point attractor. The

transients are, however, very long, taking up to a minute to die away. For this reason

the excerpts were assumed to be pseudo-stationary over the portion used for the

analysis. This might not, however, be satisfactory for the model, which assumes an

ergodic, stationary input. A consequence of this problem is that there is also a conflict

between the effects of the original time series length parameter, N. One effect of

increasing N is that it provides the analysis with more data which generally improves

the performance of the model as it did for the Lorenz input - see Figure 7.5. Contra to

this is that the pseudo-stationarity of the input in more likely to be violated as its

length is increased.

An alternative might be to drive the gong with a regular, steady state excitation,

such as a motorised periodic beater, configured so that the sound produced is constant

in quality. A similar problem may occur for the musical tones, especially that of the

saxophone, where the sound, although contrived to be constant, suffers from the

169

irregularities of human performance. Again, the performer could be replaced with a

mechanical air source, but it is also felt that it might be an advantage to model these

irregularities as they are, as they contribute to the natural quality of the sounds. For the

tuba tone, these irregularities have been preserved to an extent. This can be seen from

the preservation of the thickened form of the state space attractor seen in the phase

portraits - see Figure 7.13.

Thirdly, although a variety of analysis parameters have been used with each time

series, the form of the state space partition is fixed for the model. The means of

partitioning the embedded vectors was based mainly on the computational

convenience of implementation. Recall that it divides all the embedded vectors so that

each domain has approximately an equal number in each. The form of the partition

therefore has some relationship to the form of the attractor, but it is not designed to

optimise the performance of the model.

Finally, a possibility already mentioned and related to the state space partition is

that of trajectories being confused during the analysis. For example, neighbouring

trajectories on different parts of the attractor may be included in the same domain

during the analysis and consequently averaged out by the map fitting procedure.

Consequently, during synthesis, the mapping in that domain will not represent either

of the original neighbouring trajectories, but will be an amalgam of the two and

therefore unlike the original.

Another important point to reconsider is: what is actually expected of the model's

performance? In practice, the synthetic versions of the fan rumble and tuba sounds

preserve both the form of the attractor and the perceived sound. For the gently struck

gong sound, however, although the form of the attractor was approximated, the

perceived sound was not. For both the tuba and gong, it is long term correlations of

the time series which are important to the nature of the sound since they have strong

periodic elements.

The initial mathematical analysis of the model, however, concluded with the result

that only the mth-order jpdf of the original time series should be preserved given that

the time series is stationary due to the presence of an ergodic attractor - see Section

7.2. In other words, it can only be expected that the short-term structure of the time

series will be preserved. This is illustrated by the result obtained using the softly

struck gong sound and shown in Figure 7.11. It can be seen that the forms of the

original and synthetic attractor are similar, indicating that the synthetic system

preserves the set of possible states of the original; that the amplitude histograms are

similar, indicating that the probability distributions of one component of these states

over the attractor are similar (this distribution may be thought of as the view of a slice

170

through the centre of the attractor); but that the synthetic time series lacks the same

regularity present in the original. Detailed examination of the synthetic time series

reveals that it is in fact made up of shorter sequences from the original strung together

without the same long term order as the original. The regularity of the original

corresponds to a long term correlation that the model cannot be expected to preserve if

it only preserves its m=30th order jpdf. Correlations between samples up to 30 time

steps apart can be expected to be preserved, but not correlations over 30 steps. This is

because the autocorrelation function of a stationary stochastic process relates to its

jpdf. The autocorrelation is the expected value of the product of the process with a

delayed version of itself as a function of the delay. Let

ZtX t , (7.81)

be a stochastic process, then the autocorrelation function is defined as

tt XXEa (7.82)

the expected product of the process with a delayed version of itself. The expected

value can be calculated from the jpdf of the process,

tttttt dxdxxxPxxa ,..

(7.83)

and so the autocorrelation function is directly related to the jpdf. Thus if the model

produces a synthetic time series which preserves the mth order jpdf of the original, it

will also preserve the autocorrelation function up to a delay of =m. Figure 7.16 shows

the autocorrelation functions of both the original and synthetic gong time series

computed by convolving the time series with themselves using a range of delay times.

As can be seen, the short term correlation is indeed preserved, whereas in the long

term it is not. The details show the functions having strong similarities up to =30, as

expected, and somewhat beyond, but not over the long-term, >200.

So, it can only be expected that the short term structure of the time series can be

preserved by the model. In the case of the fan rumble, this is sufficient to preserve the

perceived sound because of its inherent irregular nature and lack of long term

correlation, a consequence of it being chaotic. For the gently struck gong, however, it

is not sufficient as the sound contains the long term structure of a periodic component.

But, for the tuba, the model does preserve the long term structure and is therefore

achieving more than is expected. This is probably due to the simplicity of the tuba

sound since when a more complex periodic time series is used (the saxophone) again

only the short term structure is preserved.

171

Figure 7.16 Autocorrelation functions for original, left, and synthetic, right, gentlystruck gong sound. The upper plot shows the function up to 8,000 delays, and thelower up to 100 delays. Both were calculated by convolving 10,000 samples of thetime series with itself for different delays.

One of the main aims of this work was to ascertain whether or not simple chaotic

systems could be used to represent complex and irregular sounds. The experiments in

this chapter have contributed positive evidence to show that chaotic systems can

indeed model sound, but no real attempts has been made to find simple chaotic

models. The work has been oriented towards preserving characteristics of the original

sounds by exploring the full ranges of analysis parameters available. The emphasis has

been on finding the best possible performance of the model and consequently the

resulting synthetic models are often far from being simple. For example, the model of

the fan rumble that produces perceptually similar results to the original (rc127)

requires over 22,000 parameters to define it. In data terms this is equivalent, for the

particular implementation used, to 186 kbytes. Relative to digital audio this is

equivalent to about 2 seconds of the original sound (16 bit, 48kHz sample rate). Since

the model is capable of producing unlimited quantities of the sound, then in it is an

efficient representation of the sound if, say, several minutes of the sound are required.

There is currently no knowing, however, how many of the model parameters are

relevant to the quality of the sound, or if and how much they can be quantised to

reduce the storage of the model. For example, the mapping parameters are stored to

double floating-point precision, which may be excessively accurate. The model has

been implemented with more regard to experimental convenience than to simplicity of

172

the resulting model. This is the case with, for example, the implementation of the state

space partition and the related search tree.

In one case, that of the tuba tone, since the model was able to preserve the tone

very well, the analysis parameters were varied with the specific intent of finding the

simplest model for which the quality of output could be maintained. Hence, the

relative simplicity of the resulting model which requires only 192 parameters.

Despite the large number of parameters associated with most of the synthetic

systems, their computational complexity is relatively low. Each iteration of the

synthetic system, and hence each output sample, requires looking up a search tree

followed by the computation of a linear mapping of the state vector. Running on a Sun

IPX Sparcstation the synthetic system generates the time series at a rate of

approximately 3 seconds per 1 second of digital audio. It is therefore easily within the

range of real-time generation, for example using a DSP. The analysis times, however,

range from about 30 seconds to several minutes.

7.10. Further Work

A number of areas for further work have been identified and are presented in the

following sub-sections.

7.10.1. Using the Same Model with More Sounds

An area of immediate further work would be to use the model as it is and

experiment with more sound time series. There are three possibilities. To use: new

versions of the sounds already used; other naturally occurring sounds, or more

produced from experiments contrived to produce chaotic behaviour. As mentioned

earlier, using gong sounds generated with a regular, possibly mechanical, excitation to

reduce transient qualities may improve the performance of the model. Some

experiments, not reported here, were conducted with a possibly chaotic, overblown

saxophone sound. This sound, however, was found to be very unstable and difficult to

maintain for long enough periods. Some work concentrating on mechanical blowing

techniques might allow better time series such as these to be generated for analysis.

To try the model with other naturally occurring sounds, more recordings need to be

made that, with judgement, may be suited to the modelling technique - e.g. steady

state, irregular sounds. This would involve, for example, more 'field work' with a

portable tape recorder or time spent examining sound libraries. The third idea is to set

up more experiments with the intention of producing chaotic sounds, as was the case

173

with the fan rumble sound. This could involve constructing physical systems with

known chaotic behaviour, for example more turbulent fluid systems, or forced

oscillations of nonlinear systems. A number of such systems are described in

[moon87].

7.10.2. Optimising the Synthetic Mapping

The next suggested area of further work concerns modifying the form of the model

with both the aims of improving performance, in terms of preservation of properties

from original to synthetic time series, and of reducing the complexity of the synthetic

system. The most immediate option is to experiment with other forms of the

embedded state space partition.

Recall that the partition is generated with respect to the embedded vectors such

that each partition contains at least M of them. Within each domain of the partition, a

linear function is then fitted to the data pairs associated with the vectors using a least

squares algorithm. This minimises

2:

1

ji Dyi

iijj uyle

(7.84)

for each domain (see Equation (7.52)). Minimising all of these effectively minimises

the difference between the embedded system function and the synthetic system

function,

ggd ~,map (7.85)

for the particular partition used (see equation 7.67). Different forms of partition may

allow for (7.85) to be reduced below the limit imposed by the recursive partition used

in the model. As mentioned in the previous section, this particular partition was

chosen for computational ease, and not with the aim of minimising (7.85). A better

approach would be to achieve an overall minimisation of (7.85) with respect to both

the partition and the linear mappings simultaneously. This may possibly be achieved

with some nonlinear global optimisation routine, such as simulated annealing or a

genetic algorithm - see [gold88] and [laar87]. Also useful in this context is the one-

step prediction error which gives an estimate of (7.85). The results presented in this

chapter have shown that the value of prediction error is related to the performance of

the algorithm - the lower the error, the greater the similarity between original and

synthetic time series. It is therefore expected that any scheme that can reduce the

prediction error would improve the quality of the model's results. Hence it is

174

suggested that a good approach would be a global optimisation of the one-step

prediction error with respect to both the mappings and the partition parameters.

7.10.3. Stability Analysis

Another area of concern connected with the model's performance is that of

stability. Typically, when the analysis parameters are chosen to give very simple

synthetic systems, they tend to be unstable, the state space trajectories not being

attracted to a set, but tending to infinity. An understanding of this instability might

allow the simpler systems to be corrected to make them stable. Their attractors may

then be compared to the original/embedded ones. The main tool for examining

stability and determining the conditions for chaotic behaviour is the set of Lyapunov

exponents. Recall that these measure the rate of separation of neighbouring initial

conditions in state space. The polarity of the Lyapunov exponents relate to the type of

attractor that the system possesses. Ideally, then, an expression relating the synthetic

system mapping parameters to the Lyapunov exponents would be the most useful.

This would then allow, for example, the synthetic system mappings to be restricted

during analysis so as to maintain stable chaotic behaviour of the resulting system. It is

not known how to form this expression since the synthetic system mapping is

relatively complex. For example, a derivation exists for the chaotic Baker map, but

this consists of only two locally linear maps and has a symmetrical partition

[moon87]. Alternatively, the Lyapunov exponents may be found numerically by

iterating the synthetic system and measuring the separation of a set of initial

conditions.

7.10.4. Connections with IFS

A theoretical concern is the connection between the synthetic system, , and

Iterated Function Systems (IFS). As was mentioned in the Inverse Problem section,

7.5, the locally linear form of the synthetic system mapping was chosen with the form

of IFS in mind. IFS have been referred to throughout this thesis as they are a well

understood and manageable framework for manipulating chaotic systems. If the

synthetic system could be made to be of the same form as an IFS then it would allow

full knowledge of the conditions under which it is stable and chaotic. It would also

allow the use of several other theorems relating to IFS such as the collage theorem.

What is currently known is that there are strong similarities between the synthetic

system and the Shift Dynamical System (SDS) form of an affine IFS. Both are locally

175

linear systems where the linear function is itself a function of the state of the system.

An SDS is the system

nn xSx 1 (7.86)

where

AwxxwxS jj when1(7.87)

for a set of non-overlapping and contractive maps, {w}. A is the attractor of the

system. In the affine case the maps are of the form

Nxxw M (7.88)

where M is a square matrix and N a vector.

The synthetic system, , is of the form,

nn zGz~

1 (7.89)

where

1121 ,,,~,,,~~ mTm zzzgzzzGzG (7.90)

and

jj Dzzlzg when~(7.91)

describes the choice of map according to the partition and

bzazl (7.92)

describes the linear map.

The system mapping may therefore be rewritten as

0

0

0

01000

0010

0001~

b

z

a

zG

(7.93)

which is a special case of (7.88) or its inverse which features in SDS. The difference

between the two systems, then, is the criteria with which the linear maps are chosen,

in other words, the form of the partition. In particular, the difference is in the

relationship between the linear mappings and the partition domains. For the SDS they

are directly related, but for the synthetic system they are independent, although in

practice there will be some relationship since both the domains and the linear maps

derive from the same set of embedded vectors.

176

These connections could be used to modify the form of the synthetic system to

make it like an IFS. It would also be worth exploring the connection with Recurrent

IFS, a more complex version of IFS, in which the partition and the linear maps are

related in different ways. If the synthetic system is made to be like some form of an

IFS, then the collage theorem may be employed as an error criteria for the inverse

problem. In fact something very similar is already being used for fitting the linear

function within each domain. Recall that the collage theorem relates to finding an IFS

whose attractor, A, best matches a given set, L.. The aim is therefore to minimise

ALd HA , (7.94)

According to the collage theorem this can be done by minimising

LwLd jHA , (7.95)

which describes the closeness to L of a collage made of mapped versions of L, since

s

ALd HA

1,

(7.96)

where s is the contractivity of the mappings. So, the better the collage, the closer the

IFS attractor, A, to the desired set L.

Now, the inverse problem for the sound model can be seen to be similar. Given an

embedded attractor, the set B, the goal is to find a system whose attractor matches it.

In other words, to minimise

BBd HA

~, (7.97)

After the partition has been computed, the linear maps in each domain are fitted so

as to minimise each of the errors

ji Dyi

iijj uyle:

21 (7.98)

which is equivalent to the squared difference between a point in B and a mapped point

in B. The sum of all errors

j

jtot ee (7.99)

is therefore equivalent to the difference between B and BG~

, which is a measure of

the closeness to B of a collage made of mapped versions of parts of B. So, the analysis

scheme used for the model is minimising some kind of collage error to give the

synthetic system mapping. Again, therefore, a strong similarity exists between the

synthetic system used for the sound model and an IFS which, if explored, it is

believed will provide greater theoretical understanding and practical help in

improving the model.

177

7.10.5. Time Varying Sounds

The final suggested area of further work concerns modelling time-varying sounds.

One of the main general conclusions drawn throughout this work is that the inherent

nature of naturally occurring sound and of interesting synthetic sound is not steady

state, but time-varying and transient. The model presented in this chapter works on the

assumption that the original sound comes from a system that has settled to ergodic,

stationary chaotic behaviour. A further goal, then, would be to develop a new model

that is itself inherently time-varying and therefore better suited to modelling natural

sounds as they are. Since the model as it is has shown to be successful at modelling

irregular dynamics, it would therefore form the basis of a time varying model. The

idea is then to model not only the dynamics at the level of the individual values of a

time series, but on a range of scales in a hierarchical, fractal manner. For example, one

system could be used to model the dynamics of short sequences of the time series and

then another used to model the dynamics of the systems themselves - i.e. the dynamics

of the parameters of the systems. Such a 'meta-dynamical' system could then

incorporate the techniques found to work so far while suiting time-varying sound. A

good example of such time varying sound and one worth considering for other

potential applications of the model is speech.

178

Chapter 8

The Poetry Generation Algorithm

This chapter is a sequel to the previous chapter in which the same framework is

used, but with a different model. The intention, as before, is to tackle the problem of

modelling a complex sound with an IFS. In the course of investigating this problem, a

solution to the roomtone problem is found.

8.1. IntroductionIn the previous chapter, the central idea was to model a sound by modelling the

physical measure, induced by embedding its time series, with the strange attractor and

measure of a chaotic system. Using the notation of the previous chapter, let un be the

original time series. It was assumed that this may be equivalently interpreted as either

the product of observing the chaotic system, , or the realisation of a stationarystochastic process, nU . The time series is then embedded to give the system, , with

embedded attractor, B, and measure . A synthetic system, , is then constructed in

embedded state space to have an attractor and measure, B~

and ~ , that are similar to B

and . The synthetic system is constructed to be a deterministic chaotic system defined

by a single mapping, G~

. This mapping is constructed with reference to a set of

embedded vectors derived from the original time series.

In this chapter, an alternative to this approach is considered where the aim is to

construct the synthetic system to be the RIA version of an IFS. This approach

therefore requires a solution to the RIA variant of the IFS inverse problem: given ,find a set of IFS mappings and associated probabilities, ii pw , that define an IFS

attractor with associated invariant measure, , that is similar to . This chapter

presents an approach this problem which, although not a complete solution, is

considered to be progress in the right direction.

Recall that the RIA of an IFS is a Markov process with an invariant measure. That

is, the set of IFS contraction mappings and associated probabilities define a Markov

operator that leaves a unique measure unchanged by its action. Since the inverse

problem requires finding a set of mappings that define an invariant measure which

approximates some given measure, a first step in the solution might be to identify a

179

Markov process with an invariant measure similar to the desired one. That is, if is

the given measure to be approximated, the first step of the solution is to find a Markov

process defined by an operator, M, that possesses an invariant measure, , that is

similar to . The second step of the solution would then be to find a set of mappings

and associated probabilities that define the Markov operator, M. This idea is

summarised in Figure 8.1

In the next section, an algorithm is introduced which suggests a solution to the

first part of the inverse problem and also a solution to the roomtone problem.

ii pw ,

IFS mappings andassociated probabilities

MMarkov operator

=M()invariant measure

ii pw ,

solution to inverseproblem

M : =M()

find Markov operatorwhich leaves v invariant

vgiven some measure

Figure 8.1 The top line shows the interdependence of the components of the RIAversion of an IFS. The bottom line shows a suggested path to obtain a solution to theinverse problem.

8.2. Description of the AlgorithmThe Poetry Generation Algorithm (PGA) is my own name for an algorithm that

has been found to exist in a number of forms, but which is usually presented as an

interesting diversion in the field of computer science [pi], [lanc]. It is a simple

algorithm that takes as input some passage of text, and then produces an output of

unlimited nonsense text that appears to have strong stylistic similarities to the original.

The example presented in Table 8.1 shows a short input passage and a number of

different output sequences obtained with different values of the algorithm's only

parameter, labelled 'seed length'.

180

Original Text (by Albert Einstein)We are in the position of a little child entering a huge library whose walls are covered to the

ceiling with books in many different tongues. The child does not understand the languages in whichthey are written. He notes a definite plan in the arrangement of the books, a mysterious order whichhe does not comprehend, but only dimly suspects.

Seed length=2We not undere covere whichichilin mystang witionly suspects. He posite ary whilin whichild the

libran to ton ton of the in ordered the plange lingues in they with books are a mystered then. The chto th hent only withe whose libran in orderstan the plangues. He lithe defin mystes not on ingwriterstang whichild the notent und to the derin they differionly sus ordefinitious note wittlecomprehe wrin thent und, arrandefin th tonges notent on manguage language linges. He alinguagement und ton th huges ared ent they dimly dimly dimly dimly a many a does on of ary differeare chichich but und the a does...

Seed length=3We are ceiling a library which they are written. He not understand the position of the are walls

are walls are ceiling with books, a library which the litten. He not coverenterious order whose wallsare walls are ceiling a litten. He not understand the ceiling with books in whose written. He notes inthe child ent of the ceiling a litten. He not comprehend, but only suspects.We are child does in manydimly suspects.We are comprehend, but only suspects.We arranguages notes notes notes notcomprehend, but only different of a library which he does in many dimly suspects.We arranguagesnot understand to they are walls are in they...

Seed length=4We are covered to they are written. He not understand the arrangement of the languages in the

arrangement of the are covered to the arrangement of a little child enterious order which he does notcomprehend, but only different tongues. The child enterious order which they are covered to theceiling a huge library whose walls are covered to the ceiling a huge library whose walls are written.He not comprehend, but only different of the position of a little child enterious order which he doesnotes a definite plan in the are written. He not understand they are written. He not comprehend, butonly dimly suspects.We are in the languages in the arrangement tongues. The ceiling a huge librarywhose walls are written. He not understand the...

Seed length=15We are in the position of a little child entering a huge library whose walls are covered to the

ceiling with books in many different tongues. The child does not understand the languages in whichthey are written. He notes a definite plan in the arrangement of the books, a mysterious order whichhe does not comprehend, but only dimly suspects. We are in the position of a little child entering ahuge library whose walls are covered to the ceiling with books in many different tongues. The childdoes not understand the languages in which they are written. He notes a definite plan in thearrangement of the books, a mysterious order which he does not comprehend, but only dimlysuspects. We are in the position of a little child entering...

Table 8.1 Example of the PGA acting on a short paragraph of text for a variety ofvalues of the seed length parameter.

As can be seen, the generated text is a chopped-up, regurgitated version of the

original text that maintains certain orderings of letters and words. The effect is that the

output, although often nonsense, retains aspects of the style of the original. The

algorithm strikes a balance between merely repeating the sequences of letters in the

same order as they originally appeared and reproducing them in an unstructured,

random way. The single 'seed length' parameter allows control over the tuning of this

balance. For a small value of seed length, the output becomes highly jumbled. For a

181

sufficiently high value, the output merely loops the original sequence over and over,

with no change to its structure. This can be seen in Table 8.1 when seedlength=15.

To understand how the algorithm works, consider a much shorter input sequence,

THE_CAT_SAT_ON_THE_MAT_

where a space is denoted with an underscore and is considered to be an addition to the

26 letter alphabet. The algorithm treats the input sequence as a circular one where the

first 'T' follows the end space. This is shown in Figure 8.2.

T

T

HE

_

C

A

_

SA

T_ON_

T

H

E

_M

AT

_

order of sequence

Figure 8.2. Input to the algorithm treated as a circular sequence.

Accompanying the circular register of letters is a smaller linear register known as

the 'seed'. For example, consider a seed whose length is two letters. This register is

initialised with any consecutive sequence of two letters from the original passage. For

example, let these be the first two, 'T H'. The initial contents of the seed may also be

considered to form the initial output of the algorithm. The algorithm then consists of

iterating of the following sequence of operations:

search for and record the position of all occurrences of the seed sequence in

the original passage. For example, if the seed is 'T H', this occurs twice in the original

sequence (shown in bold):THE_CAT_SAT_ON_THE_MAT__

Because the seed is initialised with part of the original sequence, it can always be

found to occur at least once;

choose one of the occurrences of the seed sequence. If there is more than one,

choose any of these at random with equal probabilities. For example, choose the

second one:

182

THE_CAT_SAT_ON_THE_MAT__

the next letter in the sequence after the chosen occurrence of the seed then

forms the output of the algorithm. Note that if the seed occurs at the end of the

original sequence, the first letter of the sequence is chosen. This is a consequence of

the circular treatment of the original. In this example, 'E' will be output;

shift the output letter into the seed register from the right and discard the letter

at the far left of the seed. In this example, the seed will change from 'T H' 'H E',

with the 'T' being discarded.

These four operations are iterated as often as is required to form an output

sequence, one letter at a time. An example sequence of iterations for the algorithm is

shown in Table 8.2

iterationnumber

state of seed

number ofoccurrences of

seed

occurrencechosen

output

at initialisation TH - - TH1 TH 2 2nd E2 HE 2 1st _3 E_ 2 1st C4 _C 1 1st A5 CA 1 1st T6 AT 3 3rd _7 T_ 2 2nd T8 _T 2 1st H9 TH 2 1st E

...etc ... ... ... ...

Table 8.2 Example sequence of iterations of the PGA.

So for the example shown, the generated output sequence is

THE_CAT_THE...

Note that at the 8th iteration, the seed chosen is at the very end of the original

sequence and so the letter taken for the output is from the beginning.

From this description it can be seen how this algorithm works. The seed is

initialised so that it contains part of the original sequence. When the seed is updated

with the shift operation, this situation is maintained and so there is always at least one

occurrence of the seed in the passage. Therefore, at each iteration there is always an

183

output and the seed can be updated. Consequently, any generated output sequence of

letters is guaranteed to have occurred in the same order as in the original passage. So,

for example, when the seed length is two, any consecutive two letter sequences in the

output can be found to occur somewhere in the original passage. It is this property of

the algorithm that maintains the similarity of the output with the original. Also,

therefore, the length of the seed will control how fragmented the output is. A higher

seed length ensures that the output always contains longer sequences that have

occurred in the original, such as whole words or even phrases.

When the seed length is one, there will be the highest number of occurrences of

the seed in the passage. As the seed length increases, it becomes more and more

difficult to find any particular sequence of letters. There is a limiting value for the

seed length at which there is only ever one occurrence of any seed sequence in the

original passage. If this occurs, the output is always the next letter in the original

sequence and no random jumps occur. The result is an output that consists of the

original sequence looped over and over again.

The function of this algorithm appears to be exactly that which is needed to solve

the roomtone problem. Recall from Chapter 2 that a solution to the roomtone problem

involves generating unlimited quantities of a sound given only a small fragment of

some original roomtone. The generated sound must have the same perceived

properties as the original and also not just be a periodically looped version of the

original. The PGA achieves this kind of function, but for passages of text. For a seed

length that is neither too high nor too low, for example when seedlength=4 in the

example shown in Table 8.1 , the output of the PGA is an unlimited passage of text

that has the same perceived qualities as the original passage and is not merely a

looped version of it. Further analysis of the PGA, which will be presented next, also

shows that it defines a Markov process on the embedded state space of the original

sequence that possesses an invariant measure. This is exactly what is required for the

first part of the solution to the RIA IFS inverse problem described in the introduction

to this chapter.

184

8.3. Analysis of the PGAConsider the PGA to be a dynamical system where the seed represents the state of

the system at discrete instances of time. Let the seed length be L, and write the state as

a discrete vectorx X (8.1)

where the state space, X, comprises the symbol alphabet set, A (26 letters plus the

space character), combined with itself L times. That is,AAAAX L L times (8.2)

Let the initial state of the seed be,x0 (8.3)

and letxn (8.4)

denote the state of the seed after n iterations of the algorithm. Let 10: Iiai A (8.5)

represent the original passage of text. The state of the seed will then be restricted to

the subset of state space defined by L-length sequences of letters taken from the

original text passage. That is,

XY 10:,,, mod1mod1 Liaaax ILiIii (8.6)

where (mod I) represents the nature of the circular register used to store the original

text sequence. Note that Y is the space of time-delay embedded vectors of the original

text sequence. Let Y be the number of distinct vectors in Y and let them be ordered

arbitrarily so that Yyyy ,,, 21 Y (8.7)

For each iteration of the PGA, the state of the system changes from one of the

states in the space Y to another. The sequence of state changes forms a trajectory in

state space. For example, Figure 8.3 shows part of the state space and some of the

possible states and trajectories for the case where the original text is the sequence

THE_CAT_SAT_ON_THE_MAT_. In this case the seed length, L, is two and

therefore the state space is two-dimensional.

The transition from one state to another is determined by the algorithm operation

rules given in the previous section. These declare that the state changes stochastically

such that the probability of the new state is entirely determined from the current state

of the system. Thus the dynamical system defined by the PGA is a first-order Markov

process, or a Markov chain, since it is discrete-time and discrete-state. A Markov

process is defined by the Markov operator, which in this case is a probability

transition matrix.

185

_ A C E H M N O S T

_

A

C

E

H

M

N

O

S

T

Alternative trajectories -one chosen at random

TH

HE

_M

MA

E_

_C

CA

_T

First letter of seed

Secondletterofseed

Figure 8.3 Part of the state space, X, corresponding to an example PGA showingsome of the possible states and their associated transitions.

Let Ynnnn yyy ,,, 21 (8.8)

be the state probability distribution at time n. This is a probability vector where yn

is the probability that the state of the PGA at time n is y and

Y

iiy

1

1(8.9)

The probability distribution at the next time step is then determined entirely by the

transition matrix, M, n n M 1 (8.10)

or,

YYYY

Y

YYnnnYnnn

ppp

ppp

pppyyyyyy

21

22221

112112112111

(8.11)

where pij is the probability that the state will change from being yi to y j . M is composed

of probability (row) vectors so that

Y

jijp

1

1 for each i

(8.12)

186

The values of the elements of the transition matrix are determined by the nature of

the original sequence. A transition probability will only be non-zero if the two states

appear consecutively in the original sequence. That is, if

ILkIkki aaay mod1mod1 ,,, (8.13)

then

baay IkIkj ,,, mod2mod1 (8.14)

wherebA (8.15)

The probability of going from state yi to y j will then be equal to the number of

times the y j appears in the original sequence divided by the number of times yi

appears. That is,

i

jij y

yp

#

#

(8.16)

If there is only one state which follows yi then the probability of going to it from

yi is obviously 1. If there are two distinct states, which follow yi then the probability

of moving to either is 0.5. For the example sequence,

THE_CAT_SAT_ON_THE_MAT_

and when L=2, the space of distinct subsequences will be

Y={_C,_S,_O,_T,_M,AT,CA,E_,HE,MA,N_,ON,SA,TH,T_} (8.17)and so Y=15. So, for example, the probability of transition from E_8 y to

_C1 y is p8112 since # y8 =2 and # y1=1.

Every possible state of the Markov chain may be reached from any other state and

also every possible state may recur. To see this, consider that the states happen to be

chosen in the same order as in the original sequence until it eventually loops round the

circular register. Such a Markov chain is termed 'irreducible, positive recurrent'.

Consequently, the Markov chain has the property that it possesses a unique stationary

distribution [hoel72]. That is, there exists a unique distribution, , such that 0 M nno as (8.18)

and M (8.19)

Or, in the language of IFS theory used in Chapter 3, the Markov operator defined

by the PGA possesses a unique invariant measure, . The Markov chain is defined on

the space of embedded vectors, Y, which is a subset of the full state space X. The

above analysis has therefore shown that the PGA induces a Markov chain on the set of

embedded vectors of the original sequence and which possesses an invariant measure

187

in state space. Different original sequences will define different subsets of X with

different invariant distributions.

The PGA therefore presents a solution to the first half of the RIA IFS inverse

problem described in the introduction of this chapter. It provides a means of deriving a

Markov operator from an original sequence that defines a stochastic dynamical system

that can approximate the original sequence. The next section therefore considers using

the PGA with a digital audio time series as the original sequence instead of a passage

of text.

8.4. Implementation of the PGA for SoundKeeping the notation of the previous chapter, let

10: Iiui (8.20)

represent an I-length sequence of some sound time series. Assume that this is a

product of a chaotic system or, equivalently, a realisation of the stationary random

process nU (8.21)

which is partly described by the mth order jpdf 11 ,, miiiU uuuP (8.22)

Alternatively, consider the set of m-length embedding vectors derived from the

original time series,

11:,,, 11 Imiuuuy Tmiiii (8.23)

which are distributed in embedded state space according to the measure . The

measure in m-dimensional embedded state space and the mth order jpdf are equivalent

descriptions of the probability distribution of the original time series. The stationarity

of the original sequence un is equivalent to the invariance of the measure under the

action of the embedded dynamical system.

In the previous chapter, the aim was to find a deterministic chaotic system

possessing an invariant measure, ~ , that approximates . Now, the aim is to

implement a Markov dynamical system on the embedded state space that possesses a

stationary distribution, , that approximates . Again, the information present in the

transitions between embedded vectors is used to derive the dynamical system.

How, though, can the PGA be implemented for digital audio. There are a number

of important differences between using the PGA with text and using it for sound.

Firstly, the number of possible discrete symbols in a digital audio sequence is far

higher than the number in a text sequence. For example, 16-bit digital audio has

65,536 possible symbols, compared to 27 for text. This suggests a much lower

188

likelihood of finding several occurrences of subsequences of the original sequence

that are at least two symbols long. The algorithm can only function if several

occurrences of a seed can be found so that a random choice can be made. Another

difference is that useful digital audio sequences are typically far longer than the

lengths of text typically processed by the PGA. This therefore implies considerably

greater processing power is needed to maintain the operational speed of the PGA

which, for the text implementation used to generate the examples in Table 8.1, is

satisfactory.

To investigate both these issues, the PGA algorithm used to generate the text

examples was slightly adapted to accept digital audio sequences as input. Trial runs

with the program have established the following. Firstly, the large increase in the

alphabet size is not a problem. Multiple occurrences of subsequences up to 5 in length

are typical in digital audio extracts of ~1/2 second taken from a steady-state source

such as a roomtone. This appears to be because of the correlations present in such

signals. Any value that recurs in a time series is likely to have the same neighbouring

values also recurring. The second point established is that, as expected, the processing

time for digital audio inputs of ~1/2 second and outputs of several seconds are

prohibitive, being of the order of tens of hours. It is therefore necessary to consider

alternative implementations of the PGA so as to increase the speed of the algorithm.

There are two possible alternatives implementations that represent extremes in

terms of processing speed and memory usage. One extreme is the algorithm

mentioned above and described in Section 8.2 in which the Markov transition

probabilities are calculated 'on the fly' by performing a comparison of the seed with all

subsequences of the original sequence. This algorithm has low memory usage as the

Markov transition matrix is not stored in its entirety, but a single transition probability

is calculated at a time. This algorithm, however, is slow because of the computational

cost of repeatedly searching the whole original sequence. An alternative to this

method is to calculate the transition Matrix in its entirety, once. This would involve a

significant preprocessing procedure, but the result would be a matrix that could be

referred to directly as a look-up table. This would then make iteration of the algorithm

much faster. This implementation, however, would require enormous amounts of

memory to store the transition matrix. Recall that this matrix is a 2-dimensional

square matrix of size Y, where Y is the number of distinct subsequences of the original

sequence. Consider the following estimate of Y: say an extract of original time series

uses 5000 distinct amplitude levels. Say that the neighbouring samples of any given

sample are restricted to having only 100 different values out of 5000 due to

correlation. Then for only a seed length of L=2 there will be Y=5 105 distinct

189

subsequences of length 2. Therefore the transition matrix will have the square of this,

which is of the order, 1011 elements which is prohibitively large.

A novel implementation of the PGA has therefore been developed which is

somewhere between the two extremes described above. It does not calculate the entire

transition matrix, but uses additional memory to speed up the original search version

of the algorithm. The advanced algorithm is a variant of the algorithm described in

Section 8.2. It achieves a substantial increase in speed by preprocessing the original

sequence. This results in a reduction of the number of comparisons made with the

seed for each iteration. A greater amount of memory is needed, however, to store the

preprocessed data. The preprocessing operation sorts the original sequence into

ascending order while maintaining the whereabouts of the samples in the original

sequence. This is accomplished by forming am array of 2-dimensional vectors of the

data. One component of each vector stores an amplitude value, while the other stores

its position in the original sequence. The vectors are then sorted according to the value

of the first element, the amplitude value. This is done with a modified 'quicksort'

routine that uses the first element of the vector in the sort procedure, but always

maintains the pairing of the two elements when shuffling occurs. A simple example of

the action of the sort routine is shown in Table 8.3.

samplevalue

position insequence

samplevalue

position insequence

23 1 -90 935 2 -89 1067 3 -36 8126 4 12 798 5 23 145 6 35 212 7 45 6-36 8 67 3-90 9 98 5-89 10 126 4

before preprocessing after

Table 8.3 Simple example showing how the preprocessing reorders the originalinput sequence.

For the operation of the algorithm, a copy of both the above tables is kept. The

crucial time saving then occurs when the seed is being checked against the original

sequence. In the original version of the algorithm, the first element of the seed is

checked against every value in the original sequence. If a match occurs, then the

second element is checked and so on. Only when all elements match is a record kept

190

of the position in the sequence for the subsequent random choice part of the cycle.

With the above tables, however, any values of the original sequence that match the

first element of the seed can easily be looked-up in the sorted table using a binary

search. The associated position tag is then used in conjunction with the

unpreprocessed table to check the next values in the sequence against the seed. As a

result, only a few subsequences are tested for matching as opposed to a comparison

for every value in the original sequence.

There is one final problem that was encountered when running trials of the

original PGA with sound time series. This is the presence of an amplitude

discontinuity between the end of the original time series and its beginning when it is

stored in the circular register. The end value in the sequence is presumed to be

followed by the beginning value even though they may differ considerably in

amplitude. Consequently, the output of the algorithm includes large discontinuities,

heard as clicks, which are not present in the original time series. The solution to this

problem is to cross-fade the end of the time series with the beginning as is done when

splicing two recorded audio signals together. A small linear envelope is applied to the

amplitude values at the beginning and end of the original time series. The beginning

and end of the time series are then added together when stored in the circular register.

That is, the original sequence now becomes 10: EIiui (8.24)

where E is the length of the two envelope functions. The envelope function is defined

as

E

iei B

(8.25)

for the beginning of the time series and

E

Iiei

1E

(8.26)

for the end. The original time series is then modified by the two envelopes to become10 Eiueueu iIiiii EB

1 IEiuu ii (8.27)

This process is depicted in Figure 8.4

191

0 E I I+E

I+E

I

beginning endoriginal time series

modified time series

envelope envelope

beginning and endadded together

Figure 8.4 Crossfade envelopes applied to beginning and end of original timeseries which are then added together to form modified time series. This is then storedin the circular register so that there is no amplitude discontinuity between its end andits beginning.

8.5. Results

In this section, results are presented for three sets of experiments with the digital

audio version of the PGA. The first set of experiments use a single roomtone as the

original time series with a range of combinations of the two algorithm parameters;

length of original time series, I; and seed length, L. After gaining an understanding of

the effects of these parameters, the algorithm is tried with two other roomtones having

different characteristics to the first. Finally, the PGA is tried with a number of other

naturally occurring sounds.

In the first set of experiments, an industrial roomtone has been chosen for the

original time series. This comes from a sound-effects library and is described as

'roomtone, small [room size], ventilation noise' [ssl89]. This has been chosen because

it is a constant, steady state sound without any extra artefacts such as knocks or

clunks. It is therefore likely to satisfy the condition of being a stationary random

process. Three extracts of this sound having different lengths, I, are used as input to

the PGA. These extracts have the first and last 100 samples windowed and then added

together to remove any amplitude discontinuity. The window length of 100 was

chosen to be the smallest value such that the windowed result, when played in a

continuous loop, has no perceivable clicks or introduced artefacts. Each processed

extract is then tried with a range of seed lengths, L. The PGA is stopped after a 3

second synthetic time series has been generated. The resulting sounds for each

combination of I and L are compared with the original roomtone to evaluate how good

192

the PGA has been at creating a longer version of the original with the same perceived

properties. These results are summarised in Table 8.4. A selection of the resulting

sounds, and the original roomtone are presented as sound examples and indicated on

the table. The original roomtone can be heard as Sound 33.

Figure 8.5 shows waveforms plots of 300 and 3000 sample extracts of the original

roomtone. Figure 8.6(a) shows a plot of the resulting time series from the PGA when

I=300 and L=1. It can be seen that the output consists only of the sample values that

exist in the original which, in this case, are mostly negative. Figure 8.6(b) shows a

plot of the output when I=3000 and L=3. Here, the perceived 'buzzing' quality of the

result can be seen as the output consists of multiple, looped sections of the original.

Figure 8.6(c) shows the resulting time series when the PGA continuously loops the

original.

seed length, Llength oforiginal

timeseries, I

1 2 3 4 5

300

(111)not too bad, soundstoo much like whitenoise and lacks the

low frequency depthor originalSound 34

(112)buzzy, large

percentage ofresult is looped

version of original

(113)buzzy

(114)buzz,

continuouslyloops

original

(115)buzz

3,000

(211)too fluttery and hissy

(212)good, but not

smooth enoughSound 35

(213)buzzy, large

percentage ofresult islooped

version oforiginal

Sound 36

(214)buzz,

continuouslyloops

originalSound 37

(215)buzz

30,000

(311)sounds too much like

white noise andlacks the low

frequency depth ororiginal

(312)very good,

despite slight lackof depth is almostindistinguishablefrom the original

Sound 38

(313)also very

good

(314)occasional,slight buzz

(315)occasional,slight buzzSound 39

Table 8.4 Summary of results obtained with PGA and industrial roomtone asoriginal time series. (Numbers in brackets are experiment identification.)

193

The very good perceived similarity of the result for I=30,000 and L=2 is confirmed

by comparison of the time series plots, power spectral densities and amplitude

histograms shown in Figure 8.7 where there is also a high degree of similarity between

the original and synthetic versions. Note, however, that the spectral peaks present in

the power spectral density of the original at about 60 and 120 Hz are not present in the

spectrum of the synthetic version. This explains the slight difference between the

sound of the original and synthetic, the original containing a slight hum, which the

synthetic does not. As pointed out in both the previous section and previous chapter, it

is only to be expected that short-term correlations will be preserved as only the Lth-

order jpdf of the original is preserved by the PGA. Therefore it is not guaranteed that a

tonal element, which has long term correlations, will be preserved.

These results give some insight into the way in which the PGA operates. Firstly,

the longer the original time series, the better the results. There is however, no need to

increase the length beyond a limiting value, as the quality of the result for I=30,000,

for example, is good enough for the synthetic version to pass as the original.

Secondly, there is an optimal value, or small range of values, for the seed length

which produces the best results for any given length I. If the seed length is too low, the

result is too uncorrelated and the sound too 'white'. For too high a value of L, the

output contains sections in which portions of the original are looped, causing a

buzzing sound. Increasing L beyond a threshold produces a result that is merely the

original time series being continuously looped. The relationship between the

behaviour of the PGA and L is dependent on the length of the original time series, I.

The output loops the input for smaller values of L when I is smaller. This is to be

expected since the chance of finding more occurrences of the same subsequences in

the original decreases as its length decreases.

194

Figure 8.5 Time domain plots of the original roomtone showing 300 (left) and3000 (right) samples.

(a)

(b) (c)

Figure 8.6 Time domain plots of output time series when (a) I=300, L=1, (b)I=3000, L=3, and (c) I=300, L=4.

195

(a) (b)

(c) (d)

normalised amplitude, a

P(a)

0

0.01

0.02

0.03

0.04

0.05

0.06

-0.03 -0.02 -0.01 0 0.01 0.02 0.03

normalised amplitude, a

P(a)

0

0.01

0.02

0.03

0.04

0.05

0.06

-0.03 -0.02 -0.01 0 0.01 0.02 0.03

(e) (f)

Figure 8.7 Comparison between original (left) and synthetic time series (right)showing: (a)&(b) time domain plots, (c)&(d) power spectral densities calculated byaveraging eleven 4096 point FFTs, and (e)&(f) amplitude histograms calculated from30,000 samples.

Now that a range of values for I and L have been determined that give successful

operation of the PGA with a roomtone as input, the algorithm is tried with three other

roomtones with different qualities. In each case, the length of the original used is

I=30,000 and the seed length is varied to find the resulting synthetic time series that is

most like the original.

The first of the roomtones is described as 'laboratory roomtone' and is a recording

of a fairly large workspace where a number of pieces of electrical equipment are in

operation. There are also a number of people present and an open window at the time

of recording which both contribute certain discrete artefacts to the sound, such as

clunks, and passing traffic. This roomtone sound has been chosen because it is not so

196

pure and steady state as that used in the previous experiment. The best result was

found for the case where L=3. This sound, and the original roomtone, can be heard as

Sounds 40 and 41 respectively. As can be heard, the result is not very good. This is a

consequence of the discrete artefacts present in the original and it therefore not

satisfying the stationarity requirement. The action of the PGA is to repeat parts of

these artefacts with timing and spacing that is different from the way in which they

appear in the original. The result sounds like a 'minced' version of the original.

The other two roomtones used are, like that of the first experiment, steady state

sounds, but ones which have different perceived qualities - [ssl89]. One of these is a

deep, rumble-like sound, the other is higher in frequency and includes a pronounced

high frequency tonal drone. These two sounds and the best synthetic versions

produced with the PGA can be heard as Sounds 42 - 45. Table 8.5 shows a summary

of this set of experiments.

original sound soundexample

length oforiginal, I

seedlength, L

comments

laboratoryroomtone

40and 41

30,000 3 poor result because ofdiscrete artefacts in original

deep rumble-like roomtone

42and 43

30,000 4 very good, result almostindistinguishable from

originalroomtone with

drone44

and 4530,000 4 poor result because of tonal

component in original

Table 8.5 Summary of results for PGA used with other roomtones having differentqualities.

In the final set of experiments, four other naturally occurring, steady state sounds

which are not roomtones are used as input to the PGA. The four sounds are: the sound

of a river; wind noise; the sound of audience applause; and the sound of a rainforest.

These sounds have been chosen because they are fairly constant and steady-state, and

are sounds that are likely to occur as background sounds in film sound tracks. Again,

the length of the original time series in each case is set at I=30,000 and the seed length

is varied to find the best sounding result. The originals and the results can be heard as

Sounds 16 to 23 and are summarised in Table 8.6.

197

original sound soundexample

length oforiginal, I

seedlength, L

comments

river 46and 47

30,000 3 good, butsome audible looping of

parts of the originalwind noise 48

and 4930,000 3 good, retains some of the

flapping, fluttering sound ofthe original

applause 50and 51

30,000 2 very good, almostindistinguishable from the

originalrainforest 52

and 5330,000 4 o.k., but timing and spacing

of artefacts in the originalnot retained

Table 8.6 Summary of results obtained with PGA and a variety of otherbackground sounds.

8.6. ConclusionsIn this chapter, an algorithm, the PGA, has been presented that offers a solution to

the roomtone problem described in Chapter 2. The PGA is an implementation of a

Markov model that is based on the same framework as that of the previous chapter.

That is, given an original time series that is presumed to be a realisation of a stationary

random process, the model generates a synthetic version that preserves the Lth-order

jpdf of the original. This is achieved by constructing an irreducible, positive recurrent,

first-order Markov chain that acts on the space of embedded vectors of the original

time series. This type of Markov process is guaranteed to possess a stationary

distribution, i.e. an invariant measure. The transition matrix of the Markov chain is

constructed with reference to the transitions of the embedded vectors of the original

time series. The stationary distribution of the Markov chain then models the

embedded measure of the original and therefore preserves the Lth-order jpdf. The

PGA model is therefore like the model presented in the previous chapter where the

mapping of a chaotic system is constructed from the embedded vector transitions such

that its invariant measure models the invariant measure of the embedded system.

The PGA offers a solution to the roomtone problem because it allows unlimited

quantities of a synthetic roomtone to be generated from less than one second of the

original. The quality of the results, however, depends on the nature of the original

sound. When the PGA is used with original sounds that are perceived to be constant

and steady state, the resulting synthetic time series are nearly indistinguishable from

the originals. This has been shown to be the case for two roomtones, and the sound of

198

audience applause. Good results, in which there are slight differences between the

original and synthetic versions, were found for the sound of a river, wind noise and

the sound of a rainforest. It was found that the PGA performed badly when the

original sound is not constant, because of the presence of discrete artefacts, or when it

contains tonal components. This is to be expected for sounds containing discrete

artefacts because their time series do not adequately satisfy the condition of

stationarity required by the theory. It can therefore be presumed that this condition is

met when the sound is perceived as constant and steady state as is the case for those

sounds for which the PGA performed very well. Poor performance is also to be

expected for sounds with tonal components, as it is not guaranteed that the model will

preserve the long-term correlations present in their time series since only the Lth order

jpdf is preserved.

A novel implementation of the PGA has been presented that uses acceptable

amounts of computer memory and produces results in an acceptable time. Generally,

for an original time series of length I, the amount of memory used by the algorithm is

approximately 8I bytes. For good results, that is ones that sound like the original, it

was found that L should be of the order of tens of thousands. Consequently, the

memory use is up to one quarter of a megabyte. On average, the time taken to generate

1 second of a synthetic time series is approximately 40 seconds using a 486, 66Mhz

PC. This algorithm is therefore about 10 times slower than the synthetic system

described in the previous chapter.

The PGA model presented in this chapter is also a step towards the solution of the

inverse problem for the RIA version of an IFS. In the introduction it was proposed that

this inverse problem can be broken into two steps. The first step consists of

constructing a Markov process possessing an invariant distribution from an original

time series. The RIA of an IFS defines a Markov process possessing an invariant

distribution, or measure. Therefore, the second step of the inverse problem is to find a

set of IFS contraction mappings and associated probabilities that define a similar

Markov process to that constructed from the original time series. The PGA, therefore,

offers a solution to the first step of this process. It allows the construction of a Markov

chain that models certain original time series to a high degree of perceptual accuracy.

Further work is therefore required to address the second step of the problem and find a

way of extracting a set of IFS mappings and probabilities from the transition matrix

that defines the Markov chain.

Another suggested area of further work is into the possibilities of modifying

sounds once they have been successfully modelled with the PGA. The possibilities

include modifying the probability weightings involved during the random choice of

199

sub-string occurrences, or transforming the transition probability matrix M. It is

expected that a given sound may then be subtly altered since modifying the

probabilities allows control over the invariant statistics of the sound time series which

are relevant to its perceived qualities. It may also be possible to form hybrid sounds by

combining the transition Matrices obtained from several different sounds.

Finally, some preliminary experiments using non-steady-state sounds with the

PGA, for example speech, indicate that curious special effects can be produced that

may be useful for creative purposes. As an example, some original speech and a

processed version may be heard as Sounds 54 and 55. (For this example, I=30,000 and

L=4).

200

Chapter 9

Summary and Conclusions

This thesis has presented an exploration of the idea of applying chaos theory and

fractal geometry to the problem of modelling sound. It is believed to be the first

substantial investigation of this idea.

The thesis began with the suggestion that since chaos theory and fractal geometry

are significant new developments that are having a substantial impact on many fields

in both the arts and sciences, they may be applied to the problem of modelling sound.

This idea was emphasised with the example of computer images generated with

chaotic and fractal models. These are examples of simple systems that can both model

naturally occurring images and generate complex abstract forms.

Chapter 2 discussed what is meant by a sound model. For this work, it is taken to

be a computer-based model that represents sound for a practical purpose. The uses of

such models are considered to be creative ones, such as music composition and film

sound-track editing. A specific application was described called the 'roomtone

problem' in which the desire is to extrapolate sound; that is, to produce a greater

quantity of a short original sound such that it is perceptually the same as the original.

From a consideration of these applications, a functional definition of a sound model

has been developed. This is that the model should be able to represent the perceived

characteristics of a naturally occurring sound, and that the model operates with less

parameter data than sound data. The parameter data may then be stored instead of the

sound, thereby achieving data compression, and/or the parameters can be used as

'handles' with which the sound may be manipulated. Also, the model, which is

preferably simple, may be used to generate new, abstract sounds. Examples of

conventional models which satisfy these requirements were reviewed, and it was

found that the only ones existing in the literature are ones for musical sound or

speech.

Chapter 3 presented a review of chaos theory and fractal geometry with the aim of

providing a basis to the theory used in later chapters. The first issue considered was

that of the significance of chaos and fractals. Chaos describes a class of dynamical

behaviour exhibited by nonlinear systems. It is characterised by two main features: it

may be complex despite the system being simple; and it is unpredictable despite being

governed by deterministic rules. Fractals are geometric objects not encompassed by

traditional, Euclidean geometry. They exhibit self-similarity and have space-filling

201

properties unlike Euclidean objects which is reflected by them having non-integeric

dimensions. Of great significance is that chaos and fractals provide effective models

for many naturally occurring phenomena. These phenomena are typically complex and

irregular and have not, until now, been understood nor readily modelled in other ways.

A theoretical framework for chaos was developed by applying geometry to the

state space of a dynamical system. This introduces the central concept of the strange

attractor. The importance of the strange attractor is that it is a geometric embodiment

of chaotic behaviour and is itself a fractal object. The special dynamical properties of

chaotic systems are then understood to be related to the unusual geometric properties

of the fractal strange attractor. This treatment was extended by considering the

description of the statistical behaviour of chaotic dynamics with an invariant measure

whose support set is the strange attractor in state space.

Also introduced in Chapter 3 were Barnsley's Iterated Function Systems (IFS). It

was discussed how IFS provide a well understood framework for manipulating

complex fractal strange attractors with simple systems that have several advantages.

For example, they are robust for computer implementation and have already been

shown to effectively model natural images. It was also shown how the three

equivalent views of an IFS, those of contraction mappings, Shift Dynamical Systems

(SDS) and Random Iteration Algorithms (RIA) unite fractal geometry, chaotic

dynamics and Markov processes. Other features of chaos theory were then reviewed

including bifurcation, attractor visualisation, and fractal dimension.

Chapter 4 presented a discussion of the idea of applying chaos and fractals to the

problem of modelling sound. It was argued that chaos and fractals appear ideal for use

in sound models because their properties coincide with the main functional elements

required of a sound model. That is, they can represent naturally occurring phenomena,

and generate appealing abstract forms with simple systems requiring few data

parameters. This idea prompted two specific questions. Firstly, is there any evidence

for a connection between naturally occurring sound and chaos or fractals? Secondly,

how can sound be represented with chaos or fractals? In answer to the first question,

several pieces of positive evidence were presented which suggest that such a

connection does exist. The evidence includes the existence of bifurcation sequences in

musical instruments such as woodwinds and gongs; the determination of fractal

dimensions for speech sound waveforms; and the fact that wind noise and roomtones

are examples of 1/f noise and are therefore statistically self-affine, or fractal, signals.

In response to the second question, it was suggested that a sound may be

represented in either of two ways with a strange attractor: by representing the

dynamics of a sound and therefore assuming it is the product of a chaotic system; or

202

by representing the static waveform of the sound and therefore assuming that the

waveform is a fractal object. It is these suggestions that form the main concern of this

work and were the subject of investigation of the rest of the thesis. Chapters 5,6,7 and

8 presented four different experimental investigations and contain original

contributions towards the solution of the problem.

Chapter 5 presented a variety of experiments with a synthesis-only technique that

allows strange attractors to be turned into complex abstract sound waveforms. The

sounds are generated from simple systems that require small amounts of parameter

data. The model is based on Barnsley's Fractal Interpolation Functions (FIF) which are

a class of IFS attractors that are self-affine waveforms. The chapter began by

reviewing the theory of FIFs and developing an algorithm suitable for generating

digital audio. This algorithm was then used with a variety of input parameter sets to

gain an understanding of the nature of the model. The most significant result of this

chapter was that the FIF model can be used to generate a new class of sounds which

have both rhythm and timbre. The sounds are interesting, unusual and unlike those

produced with conventional synthesis techniques. They are considered to have

potential use for computer music composition. They are novel because the waveforms

are composed of patterns that are common to a range of time scales. The same

patterns are then perceived differently as both rhythm and timbre. Any modification to

the input parameters effects a change to both the nature of the rhythm and the quality

of the sound.

The FIF model was then incorporated into a more elaborate scheme where the

input parameters are controlled by a genetic model. The user then acts as 'artificial

selection' and can evolve sounds by accumulating small changes due to the

recombination and mutation of the parameters. This was found to be an effective and

compelling way of exploring the space of FIF sounds.

It was concluded that the products of the FIF model demonstrate that there is an

acoustic equivalent to the abstract fractal images such as the Julia set. That is,

appealing abstract fractal sounds can be readily generated from simple systems.

Chapter 6 continued with an investigation of FIFs, but as a means to represent

naturally occurring sound as part of an analysis/synthesis model. It was therefore an

investigation of the FIF inverse problem; given an original sound time series, find a

set of FIF parameters that specify an FIF that is similar to the original. Initial

investigations into this problem using interpolation points derived from the original

time series led to the conclusion that the problem is a difficult one and a systematic

approach is required.

203

The rest of the chapter presented an investigation into work by David Mazel found

in the literature. Mazel presents a number of FIF-based models for time series and

associated analysis algorithms which he claims offer solutions to the inverse problem.

An analysis of Mazel's results was conducted by comparing the

degradation/compression performance with that expected for simple amplitude

requantisation. It was found, however, that the performance of Mazel's models/inverse

algorithms are not significantly better than that of requantisation. To confirm this

finding, one of his inverse algorithms was reimplemented for use with sound time

series. The algorithm used is that of the self-affine model and was chosen because it is

directly applicable to the FIF model used in this thesis. Also, Mazel only presents one

result for this algorithm, which is the best of all his results. It was therefore decided

that more results are needed to reach a conclusion on the ability of the algorithm. The

reimplementation of the algorithm with a number of sounds as the original time series

yielded poor results. Inspection of the operation of the algorithm, however, led to a

proposition as to why the results were poor. The proposition was tested by modifying

the algorithm to counteract the problem. Some of the results from the modified

algorithm were then found to be significantly better than those of the unmodified

algorithm, Mazel's other models/algorithms, and amplitude requantisation. The best

results were found to occur for those sounds that have 1/f power spectral densities

such as wind noise and roomtones. It was concluded that this is a satisfying result as it

shows the model is exploiting the fractal redundancy of the statistically self-similar 1/f

noises.

Chapter 7 presented an account of the main approach taken in this thesis to the

problem of modelling sound with strange attractors. It is the other approach proposed

in Chapter 4 where the dynamics of a sound are represented by a strange attractor and

associated invariant measure. The chapter began by presenting the necessary

theoretical framework of how a time series obtained from observing a chaotic system

relates to that system. This involves the theory of embedding which shows how an

embedded system which shadows the original system may be constructed from a set of

time-delayed values of the observed time series. Using the theory of embedding, an

analysis/synthesis model was proposed where the dynamics of the sound are modelled

in embedded state space by a synthetic chaotic system whose attractor and associated

measure match those of the embedded system. It was argued that modelling the

embedded attractor and measure is equivalent to modelling the jpdf of the original

time series. The dynamics of a sound can therefore be modelled by finding a solution

to the inverse problem. This is: from the embedded sound time series, find a synthetic

system mapping that defines an attractor and measure that are similar to the embedded

ones.

204

A novel solution to this inverse problem was then proposed based on work found

in the literature on time series prediction. It involves partitioning embedded state

space and finding a locally linear function for each partition such that a nonlinear

synthetic system mapping is defined. This procedure forms the analysis half of the

model and iteration of the piece-wise linear synthetic system mapping then forms the

synthesis half. The linear functions are fitted using a least squares procedure that

minimises the function's prediction error with reference to an embedded extract of the

original time series.

The model was tested with a time series derived from the numerical integration of

the Lorenz chaotic system. It was found that, for a range of analysis parameters, the

synthetic system possesses a strange attractor that is very similar to that of the

embedded system derived from the original time series. Trials with ranges of analysis

parameters led to an understanding of the nature of the analysis scheme and showed

that the degree of similarity between the embedded attractor and the synthetic version

relates to the size of the prediction error. Having confirmed that the model can work

with simulated chaotic data, it was then tried with a range of naturally occurring sound

time series. These were chosen for their steady-state qualities and the belief that they

are generated by nonlinear dynamical systems. It was found that the scheme is capable

of modelling the sound of 'air noise' to a high degree of perceived similarity. This is

considered to be the first demonstration that a chaotic system is capable of modelling

sound by representing its dynamics with a strange attractor. A similarly good result

was also obtained for a tuba sound.

The results obtained using other sounds, although not so good at preserving the

perceived qualities of the original, were shown to preserve other features such as

embedded attractor shape, power spectral density and amplitude pdf. This was found

for the sounds of the wind, a gong, and a saxophone. It was found that the relative

performance for the different sounds was reflected by the relative values of the one-

step prediction errors. The computational complexity and size of the model were

investigated showing that the model is simple to implement, but requires large

amounts of parameter data. It was suggested that further work is required on the

optimisation of the analysis procedure with respect to the size of the model.

It was concluded that the results obtained with this model are good enough to

confirm the feasibility of the approach and to warrant further investigation. The strong

link between the model and Barnsley's IFS was also discussed. A number of further

experiments and immediate improvements were suggested as well as ideas for a

longer-term strategy.

205

Chapter 8 presented the final experimental investigation which is an approach to

the solution of the roomtone problem based on the theoretical framework of Chapter

7. Instead of modelling the embedded attractor and measure of an original time series

with a deterministic nonlinear dynamical system, a Markov model was proposed. This

idea came from a consideration of the equivalence between the deterministic SDS

version of an IFS and the RIA version, which defines a Markov process. The inverse

problem for the RIA version of an IFS was considered and a two stage solution

proposed. The first stage involves determining a Markov chain from a given

realisation of a stationary process, then the second stage involves obtaining the

mappings and associated probabilities for the RIA of an IFS from the Markov

transition matrix.

The Poetry Generation Algorithm, found in the literature, was then presented

which allows unlimited quantities of synthetic text to be generated in the style of some

given original passage. It was argued that this algorithm, if modified to work with

digital audio, presents a solution to the roomtone problem. It was then shown that the

PGA works by determining a Markov chain that operates on the set of embedded

sequences of the original text sequence and that possess an invariant measure. The

PGA therefore also presents a solution to the first half of the proposed RIA inverse

algorithm.

An implementation of the PGA was then presented that works with digital audio.

Some modifications to the original algorithm were made so as to produce results in an

acceptable time and using acceptable amounts of computer memory. Results were

then presented for the PGA used with a variety of roomtone time-series. It was found

that the PGA provides a solution to the roomtone problem for certain types of sound.

That is, unlimited quantities of a sound may be generated from less than a second of

an original such that the synthetic version sounds almost indistinguishable from the

original. It was concluded that the PGA works well when the original sound is steady-

state, and therefore presumed to be stationary. Good results were also presented for

other background sounds such as that of a river and that of audience applause.

My overall conclusion to this work is that chaos theory and fractal geometry

provide a rich source of ideas and techniques to be applied to the topic of modelling

sound with a computer for creative uses. Chaos and fractals are, in their own right,

compelling and inspiring subjects that have great intuitive appeal. It is also easy to see

how they might relate to a wide range of everyday complex phenomena. When

beginning this work, I felt intuitively that there must be a strong relationship between

chaos/fractals and naturally occurring sound. This intuition was a product of the

cogent nature of chaos theory and of observations of the complex and irregular sounds

206

that occur in nature that are not musical or speech sounds. The intuition was also that

the development of sound models has paralleled the conventional treatment of

dynamics in science and engineering. That is, complex, irregular and nonlinear

phenomena have received less attention than simple, regular, linear phenomena for the

reason that the theory and models of the latter are well developed, while for the former

they have been less well understood. Hence the emphasis on modelling regular

musical sound with linear systems and using Fourier theory. While this approach has

been very successful, its inadequacies are well described by the quote from Iannis

Xenakis given in the Introduction. It was therefore hoped that applying chaos and

fractals to sound might provide new models and techniques that would complement

existing ones, but that would be nonlinear and concern complex and irregular sound.

Given the results of this thesis, my conclusion is that the original intuition was

correct and that I have initiated a number of useful models and techniques. Some of

these are ready for use, for example the FIF synthesis technique and the roomtone

model, while others provide material for further research, such as the predictive

chaotic model. The results on which this conclusion is based may be further

summarised as follows. Firstly, the assembled evidence indicates that there are a

number of strong connections between naturally occurring sound and chaos and

fractals. Sound made from the complex strange attractors of very simple systems

(FIFs) can have interesting and musically useful properties. Sound waveforms with

statistically self-affine properties may be compressed by representing them with

fractal waveforms (which are again FIFs). The dynamics of naturally occurring chaotic

sound, such as 'air noise', may be modelled via a strange attractor with a synthetic

chaotic system such that the perceived qualities of the original are preserved.

Although the chaotic system requires many parameters, it is very simple to implement,

and there are good reasons to expect that the number of parameters can be reduced. In

the case where the sound is predominantly regular, but which includes irregularities

(the tuba sound), the chaotic model may be made very simple. Steady-state ambient

noises may be convincingly modelled with a stochastic system that has strong

similarities to an alternative view of a chaotic system (the RIA variant of an IFS). It is

expected that this model way be developed to exploit this connection.

While these results satisfy many of the questions posed at the outset, it is felt that

there are considerable limitations to the models developed. These limitations concern

the type of sound signals that the models are suited to. For example, the FIF model

represents signals that have exactly self-affine waveforms while the predictive chaotic

model and the PG algorithm assume stationary behaviour with invariant measures in

state space. Although these models have been shown to be capable of representing

sounds, these sounds form a limited subset of those occurring naturally. It is unlikely,

207

for example, that a sound will be exactly self-affine, and is more likely to be

statistically self-affine. There are, however, many other fractal models that may be

used. There are several partial self-affine variants of the FIF model and a range of

statistical models that have been used for image modelling.

Also, sound is typically not steady-state and stationary as required by the

predictive chaotic model. It is more likely to be time-varying or to be a discrete event

with transient behaviour. In fact it is these time-varying qualities of sound that are of

enormous perceptual significance, a fact widely acknowledged by computer

musicians. Again, however, there are many ways in which the chaotic model may be

developed to represent time-varying behaviour. One idea has already been mentioned

in the Conclusions of Chapter 7 of developing meta-dynamical systems in which

nonlinear dynamics are responsible for behaviour on several time scales. There also

many other nonlinear models which feature in what has become known collectively as

'complexity theory' that could be investigated. One example is that of cellular

automata (CA) which are nonlinear dynamical systems capable of many types of

behaviour. One type is known as 'class IV' behaviour which is poised between order

and chaos and has been described as behaviour having effectively very long transients.

Again, observing this type of behaviour, the intuition is that class IV CA may be

capable of capturing the dynamics of natural sound.

The conclusion drawn, then, is that this work has provided an initial investigation

into the idea of using nonlinear dynamics to model sound with enough positive results

to warrant further investigation. This work has concentrated on several types of

nonlinear model that are strictly fractal or strictly chaotic. What is then needed are

developments which modify the models to be better suited to the overall dynamics of

naturally occurring sound. For example, instead of choosing an unvarying part of a

gong sound and trying to fit a steady-state chaotic model to it, it would be better to

have a model that is based on transient nonlinear dynamics.

In any case, another general conclusion that can be made and which it is expected

would often apply, is that the problem in hand is a difficult one. The desire, in general,

is to have a simple, compact, manageable model for a complex sound. This therefore

involves some kind of solution to the 'inverse problem'. This always involves finding a

way of going from a complex data set to a simple algorithmic description of it.

Intuitively, this seems like a difficult thing to do. Experience has shown that there is

no problem in producing complex data sets from simple nonlinear systems, but that to

fit a simple system to any given, naturally occurring data set is difficult. It has been

suggested that the reason for this has something to do with algorithmic complexity

[mant92].

208

The algorithmic complexity of a set is a measure of its information content in the

context of its generation - see [chai88]. Specifically, it is the length of the shortest

program that, when executed on a Universal Turing Machine, generates the original

set. Typically, both the original set and the program code are represented as binary

strings. Their information content, in bits, is then simply their respective lengths. For a

truly random binary sequence, which has no redundancy, each of the digits has to be

stored explicitly in the program as there is no other means of generating them. The

algorithmic complexity is therefore approximately equal to the length of the original

sequence. At the other extreme, a sequence that is made up of all 1's, and therefore has

high redundancy, may be represented by simple program containing a loop. The size

of this program, will therefore be much smaller than the size of the original sequence.

Take, for example, the inverse problem for IFS which can be seen in relation to

the concept of algorithmic complexity. The solution of the inverse problem requires a

means of computing, or at least approximating, the algorithmic complexity of a given

set. That is, consider that the IFS generation algorithm and associated parameters form

the program code. Running this program then specifies a complex IFS attractor. The

inverse problem is to find the fewest IFS parameters to specify an attractor of a

desired form. The amount of data required to store the model and the parameters is

then an approximation to the algorithmic complexity of the set being modelled.

The difficulty, however, is that it is not possible, in general, to obtain the

algorithmic complexity of some given set [ford86 and mant89]. This is because the

problem is undecidable - it is equivalent to the Turing halting problem, a form of

Gödel's Incompleteness Theorem - see [hofs79]. That is, it is not possible to tell in

advance whether a program for computing the algorithmic complexity will eventually

stop or not. The only possible option is to try the program and see. This may explain

why it is difficult to find good solutions to inverse problems. A good solution will

always imply that the algorithmic complexity is being calculated which is, inherently,

problematical. This, however, is a problem for solutions to general inverse problems.

In practice, a solution is typically sought for a specific case where a certain type of

redundancy is being exploited by the model. There is no doubt that the redundancies

implied by chaos theory and fractal geometry, as well as occurring in a variety of

natural phenomena, are relevant to the problem of modelling sound.

On the basis of the results outlined in this thesis, I feel that applying chaos theory

and fractal geometry to sound modelling has considerable future potential. I imagine

that it may provide a basis for modelling irregular sound in much the same way as

linear theory has for the regular case. Further, powerful, computer-based tools and

techniques could be developed for anyone interested in the creative use of sound.

Documents

USING STRANGE ATTRACTORS TO MODEL SOUNDjonathanmackenzie.net/portfolio/PHD.pdf · The second approach focuses on modelling the dynamics of a sound via the embedded reconstruction