Smuse an Embodied Cognition Approach to Interactive Music

_364 _365

9. CONCLUSIONS

It is difficult to evaluate the role that CAC tools play inthe compositional process; indeed, the influence of eventhe most inert music notation software on compositionalthinking is difficult to deny [2, 3, 4, 22]. ManuScoreexpands the field of CAC tools by augmenting commonnotation-based approaches with a more open conceptualdesign, and with the inclusion of corpus-based, generativecapabilities. Although further validation of ManuScore isrequired, the user and listener studies outlined in this pa-per suggest that our goal of providing an interactive CACtool, which enhances the compositional process, withoutdisrupting the composers musical language, has been atleast provisionally achieved.

10. ACKNOWLEDGEMENTS

This work was made possible in part by the Social Sci-ences and Humanities Research Council, the Canada Coun-cil for the Arts, and the Natural Sciences and EngineeringResearch Council of Canada.

11. REFERENCES

[1] G. Assayag, C. Rueda, M. Laurson, C. Agon, andO. Delerue, Computer-assisted composition at IR-CAM: from Patchwork to Openmusic, ComputerMusic Journal, vol. 23, no. 3, pp. 5972, 1999.

[2] A. Brown, Modes of compositional engagement,Mikropolyphony, vol. 6, 2001b.

[3] D. Collins, A synthesis process model of creativethinking in music composition, Psychology of Mu-sic, vol. 33, no. 2, p. 193, 2005.

[4] , Real-time tracking of the creative musiccomposition process, Digital Creativity, vol. 18,no. 4, pp. 239256, 2007.

[5] D. Cope, The Composers Underscoring Environ-ment: CUE, Computer Music Journal, vol. 21,no. 3, pp. 2037, 1997.

[6] O. Laske, Composition Theory: An enrichmentof music theory, Journal of New Music Research,vol. 18, no. 1, pp. 4559, 1989.

[7] , The computer as the artists alter ego,Leonardo, pp. 5366, 1990.

[8] , Composition Theory in Koenigs Project Oneand Project Two, Computer Music Journal, vol. 5,no. 4, pp. 5465, 1981.

[9] , Toward an Epistemology of Composition,Interface, vol. 20, pp. 235269, 1991.

[10] M. Laurson, M. Kuuskankare, and V. Norilo, Anoverview of PWGL, a visual programming environ-ment for music, Computer Music Journal, vol. 33,no. 1, pp. 1931, 2009.

[11] J. Maxwell, P. Pasquier, and A. Eigenfeldt, Theclosure-based cueing model: Cognitively-inspiredlearning and generation of musical sequences, inProceedings of the 2011 Sound and Music Comput-ing Conference, 2011.

[12] J. Maxwell, P. Pasquier, A. Eigenfeldt, andN. Thomas, MusiCOG: A cognitive architecture formusic learning and generation, in Proceedings ofthe 2012 Sound and Music Computing Conference,2012.

[13] J. McCormack, P. McIlwain, A. Lane, and A. Dorin,Generative composition with Nodal, in Workshopon Music and Artificial Life (part of ECAL 2007),Lisbon, Portugal. Citeseer, 2007.

[14] T. Potter, All my children: A portrait of Sir AndrzejPanufnik based on conversations with Tully Potter,The Musical Times, vol. 132, no. 1778, pp. 186191,1991.

[15] C. Scaletti and R. Johnson, An interactive envi-ronment for object-oriented music composition andsound synthesis, ACM SIGPLAN Notices, vol. 23,no. 11, pp. 222233, 1988.

[16] I. Stravinsky, T. Palmer, N. Elliott, and M. Bragg,Once, at a border ...: Aspects of Stravinsky. KulturInternational Films, 1980.

[17] H. Taube, Common Music: A music compositionlanguage in Common Lisp and CLOS, ComputerMusic Journal, vol. 15, no. 2, pp. 2132, 1991.

[18] L. Tenney, J. Polansky, Temporal gestalt perceptionin music, Journal of Music Theory, vol. 24, no. 2,1980.

[19] B. Truax, The PODX system: interactive composi-tional software for the DMX-1000, Computer Mu-sic Journal, pp. 2938, 1985.

[20] J. Ventrella, Evolving structure in Liquid Music,The Art of Artificial Evolution, pp. 269288, 2008.

[21] G. Wang, P. Cook et al., ChucK: A concurrent, on-the-fly audio programming language, in Proceed-ings of International Computer Music Conference,2003, pp. 219226.

[22] J. Wiggins, Compositional process in music, Inter-national handbook of research in arts education, pp.453476, 2007.

[23] D. Zicarelli, M and Jam Factory, Computer MusicJournal, vol. 11, no. 4, pp. 1329, 1987.

THE SMUSE: AN EMBODIED COGNITION APPROACH TOINTERACTIVE MUSIC COMPOSITION

Name of author

Address - Line 1Address - Line 2Address - Line 3

ABSTRACT

The evolution of computer-based music systems has gonefrom computer-aided composition, which transposed thetraditional paradigms of music composition to the digitalrealm, to complex feedback systems that allow for richmultimodal interactions. Yet, a lot of interactive musicsystems still rely on outdated principles in the light ofmodern situated cognitive systems design. Moreover, therole of human emotional feedback, arguably an importantfeature of musical experience, is rarely taken into accountinto the interaction loop. We propose to address these lim-itations by introducing a novel situated synthetic interac-tive composition system called the SMuSe (for SituatedMusic Server). The SMuSe is based on the principles ofparallelism, situatedness, emergence and emotional feed-back and is built on a cognitively plausible architecture.It allows to address questions at the intersection of musicperception and cognition while being used as a creativetool for interactive music composition.

1. BACKGROUND

Interactivity has now become a standard feature of manymultimedia systems and plays a fundamental role in con-temporary art practice. Specifically, real-time human/machineinteractive music systems are now omnipresent as bothcomposition and live performance tools. Yet, the terminteractive music system is often misused. The inter-action that takes place between a human and a system is aprocess that includes both control and feedback, where thereal-world actions are interpreted into the virtual domainof the system [4]. If some parts of the interaction loop aremissing (for instance the cognitive level in Figure 1), thesystem becomes only a reactive (vs. interactive) system.

As a matter of fact, in most of current human-computermusical systems, the human agent interacts whereas themachine due to a lack of cognitive modeling only reacts.Although the term interactivity is widely used in the newmedia arts, most systems are simply reactive systems [4].Furthermore, the cognitive modeling of interactive multi-media systems, when it exists, often relies on a classicalcognitive science approach to artificial systems where the

different modules (e.g. perception, memory, action) arestudied separately. This approach has since been chal-lenged by modern cognitive science, which emphasizesthe crucial role of the perception-action loop, the build-ing of cognitive artifacts, as well as the interaction of thesystem with its environment [37]. In this paper we pro-pose a novel approach to interactive music system designinformed by modern cognitive science and present an im-plementation of such a system called the SMuSe.

Sensors

Actuators

MemoryCognition

Interaction MachineHuman

Senses

Effectors

MemoryCognition

Figure 1. Human machine interaction (adapted from [4])

2. FROM EMBODIED COGNITIVE SCIENCE TOMUSIC SYSTEMS DESIGN

2.1. Classical View

A look at the evolution of our understanding of cognitivesystems put in parallel with the evolution of music com-position practices, gives a particularly interesting perspec-tive on some limitations of actual interactive music sys-tems.

The classical approach to cognitive science assumes thatexternal behavior is mediated by internal representations[6] and that cognition is basically the manipulation of thesemental representations by sets of rules. It mainly relies onthe sense-think-act framework [27], where future actionsare planned according to perceptual information.

Interestingly enough, a parallel can be drawn between clas-sical cognitive science and the development of classicalmusic which also heavily relies on the use of formal struc-tures. It puts the emphasis on internal processes (compo-sition theory) to the detriment of the environment or thebody, with a centralized control of the performance (theconductor).

_366 _367

Disembodiment in classical music composition can be seenat several levels. Firstly, by training, the composer is usedto compose in his head and translate his mental represen-tations into an abstract musical representation: the score.Secondly, the score is traditionally interpreted live by theorchestras conductor who controls the main aspects ofthe musical interpretation, whereas the orchestra musi-cians themselves are left with a relatively reduced inter-pretative freedom. Moreover, the role of audience as anactive actor of a musical performance is mostly neglected.

2.2. Modern View

An alternative to classical cognitive science is the connec-tionist approach that tries to build biologically plausiblesystems using neural networks. Unlike more traditionaldigital computation models based on serial processing andexplicit manipulation of symbols, connectionist networksallow for fast parallel computation. Moreover, it does notrely on explicit rules but on emergent phenomena stem-ming from the interaction between simple neural units.Another related approach, called embodied cognitive sci-ence, put the emphasis on the influence of the environmenton internal processes. In some sense it replaced the viewof cognition as a representation by the view that cognitionis an active process involving an agent acting in the en-vironment. Consequently, the complexity of a generatedstructure is not the result of the complexity of the under-lying system only, but partly due to the complexity of itsenvironment [34].

A piece that gives a good illustration of situatedness, dis-tributed processing, and emergence principles is In C byTerry Riley. In this piece, musicians are given a set ofpitch sequences composed in advance, but each musicianis left in charge of choosing when to start playing and re-peating these sequences. The piece is formed by the com-bination of decisions of each independent musician thatmakes her decision based on the collective musical outputthat emerges from all the possible variations.

Following recent evolution of our understanding of cogni-tive systems, we emphasize the crucial role of emergence,distributed processes and situatedness (as opposed to rule-based, serial, central, internal models) in the design of in-teractive music composition systems.

2.3. Human-In-The-Loop

With the advent of new sensor technologies, the influenceof the environment on music systems can be sensed viaboth explicit and implicit interfaces, which allow accessto the behavioral and physiological state of the user. Mu-sic is often referred to as the language of emotion, hencehuman emotions seems to be a natural feedback channelto take into account in the design of a situated music sys-tem. We believe that in order to be complete, the designof a situated music system should take into considerationthe emotional aspects of music.

2.3.1. Explicit Gestural Interfaces

The advent of new sensing technologies has fostered thedevelopment of new kind of interfaces for musical ex-pression. Graphical User Interfaces, tangible interfaces,gesture interfaces have now become omnipresent in thedesign of live music performance or compositions [24].Most of these interfaces are gesture-based interfaces thatrequire explicit conscious body movements from the user.They can give access to behavioral or self-reported infor-mation, but not to implicit emotional states of the user.

2.3.2. Implicit Biosignal Interface

Thanks to the development of more robust and accuratebiosignal technologies, it is now possible to derive emotion-related information from physiological data and use it asan input to interactive music systems. Although the ideais not new [15, 30], the past few years have witnessed agrowing interest from the computer music community inusing physiological data such as heart rate, electrodermalactivity, electroencephalogram and respiration to generateor transform sound and music. Providing emotion-basedphysiological interface is highly relevant for a number ofapplications including music therapy, diagnosis, interac-tive gaming, and emotion-aware musical instruments.

2.3.3. Emotional Mapping

Music and its effect on the listener has long been a subjectof fascination and scientific exploration from the Greeksspeculating on the acoustic properties of the voice [14]toMusak researcher designing soothing elevator music. Ithas now become an omnipresent part of our day to daylife. Music is well known for affecting human emotionalstates, and most people enjoy music because of the emo-tions it evokes. Yet, although emotions seem to be a cru-cial aspect of music listening and performance, the scien-tific literature on music and emotion is scarce if comparedto music cognition or perception [21, 10, 17, 5]. The re-lationship between specific musical parameters and time-varying emotional responses is still not clear. Biofeedbackinteractive music systems appear to be an ideal paradigmto explore the complex relationship between emotion andmusic.

2.4. Perceptual and Cognitive Models of Musical Rep-resentation

What are the most relevant dimensions of music and howshould they be represented in the system ? Here, we take acognitive psychology approach, and define a set of param-eters that are the most salient perceptually and the mostmeaningful cognitively. Music is a real-world stimulusthat is meant to be appreciated by a human listener. It in-volves a complex set of perceptive and cognitive processes

Figure 2. The different levels of sequential groupingfor musical material: event fusion, melodic and rhythmicgrouping and formal sectioning (from [35])

that take place in the central nervous system. The fast ad-vances in the neuroscience of music over the past twentyyears have taught us that these processes are partly inter-dependent, are integrated in time and involve memory aswell as emotional systems [16, 25, 26]. Their study shedlight on the structures and features that are involved inmusic processing and stand out as being perceptually andcognitively relevant. Experimental studies have found thatmusical perception happens at three different time scale,namely the event fusion level when basic musical eventssuch as pitch, intensity and timbre emerge ( 50ms); themelodic and rhythmic grouping when pattern of those ba-sic events are perceived ( 5s), and finally the form level(from 5s to 1 hour) that deals with large scale sections ofmusic [35] (Figure 2). This hierarchy of three time scaleof music processing forms the basis on which we builtSMuSes music processing chain.

2.5. Music Processing Modules

Research on the brain substrates underlying music pro-cessing has switched in the last twenty years from a clas-sical view emphasizing a dichotomy between language(supposedly processed in left hemisphere) and music (re-spectively right hemisphere) to a modular view [1]. Thereis some evidence that music processing modules are orga-nized into two parallel but largely independent submod-ules that deal with pitch content (What ?) and tempo-ral content (When ?) respectively [26, 18]. This ev-idence suggests that they can be treated separately in acomputational framework. Additionally, studies involv-ing music-related deficits in neurologically impaired indi-viduals (e.g. subjects with amusias who cant recognizemelodies anymore) have shown that music faculty is com-posed of a set of neurally isolable processing componentsfor pitch, loudness and rhythm [25]. The common view isthat pitch, rhythm and loudness are first processed sepa-

Voice n

When

What

Instrument n

Room, Location

Musical Event

Acoustic Event

Pulse

Rhythm Agent

Pitch Classes/Chord Agent

RegisterAgent

DynamicsAgent

ArticulationAgent

Attack Brightness Flux Noisiness Inharmonicity

Reverb Spatialization

Voice 1

Instrument 1

Global Reference Expressive Modulation

Memory

Figure 3. SMuSE is based on a hierarchy of musicalagents

rately by the brain to then later form (around 25-50ms) animpression of unified musical object [20] (see [16] for areview of the neural basis of music perception). This mod-ularity as well as the three different levels and time scalesof auditory memory (sound, groups, structure) form a setof basic principles for designing our bio-mimetic musicsystem.

3. A COMPUTATIONAL MODEL BASED ON ASOCIETY OF MUSICAL AGENTS

The architecture of SMuSe is inspired by neurological ev-idence. It follows a hierarchical and modular structure,and has been implemented as a set of agents using data-flow programming.

3.1. The SMuSes Architecture

SMuSe is built on a hierarchical, bio-mimetic and mod-ular architecture. The musical material is represented atthree different hierarchical levels, namely event fusion,event grouping and structure corresponding to differentmemory constraints. From the generative point of view,SMuSe modules are divided into time modules (when)that generate rhythmic pattern of events and content mod-ules (what) that for each time event choose musical ma-terial such as pitch and dynamics (Figure 3).

These cognitive and perceptual constraints influenced thedesign of the SMuSes architecture. At the low event fu-sion level, SMuSe provides a set of synthesis techniquesvalidated by psychoacoustic tests [2] that give perceptualcontrol over the generation of timbre as well as the use ofMIDI information to define basic musical material such as

_366 _367

Disembodiment in classical music composition can be seenat several levels. Firstly, by training, the composer is usedto compose in his head and translate his mental represen-tations into an abstract musical representation: the score.Secondly, the score is traditionally interpreted live by theorchestras conductor who controls the main aspects ofthe musical interpretation, whereas the orchestra musi-cians themselves are left with a relatively reduced inter-pretative freedom. Moreover, the role of audience as anactive actor of a musical performance is mostly neglected.

2.2. Modern View

An alternative to classical cognitive science is the connec-tionist approach that tries to build biologically plausiblesystems using neural networks. Unlike more traditionaldigital computation models based on serial processing andexplicit manipulation of symbols, connectionist networksallow for fast parallel computation. Moreover, it does notrely on explicit rules but on emergent phenomena stem-ming from the interaction between simple neural units.Another related approach, called embodied cognitive sci-ence, put the emphasis on the influence of the environmenton internal processes. In some sense it replaced the viewof cognition as a representation by the view that cognitionis an active process involving an agent acting in the en-vironment. Consequently, the complexity of a generatedstructure is not the result of the complexity of the under-lying system only, but partly due to the complexity of itsenvironment [34].

A piece that gives a good illustration of situatedness, dis-tributed processing, and emergence principles is In C byTerry Riley. In this piece, musicians are given a set ofpitch sequences composed in advance, but each musicianis left in charge of choosing when to start playing and re-peating these sequences. The piece is formed by the com-bination of decisions of each independent musician thatmakes her decision based on the collective musical outputthat emerges from all the possible variations.

Following recent evolution of our understanding of cogni-tive systems, we emphasize the crucial role of emergence,distributed processes and situatedness (as opposed to rule-based, serial, central, internal models) in the design of in-teractive music composition systems.

2.3. Human-In-The-Loop

With the advent of new sensor technologies, the influenceof the environment on music systems can be sensed viaboth explicit and implicit interfaces, which allow accessto the behavioral and physiological state of the user. Mu-sic is often referred to as the language of emotion, hencehuman emotions seems to be a natural feedback channelto take into account in the design of a situated music sys-tem. We believe that in order to be complete, the designof a situated music system should take into considerationthe emotional aspects of music.

2.3.1. Explicit Gestural Interfaces

The advent of new sensing technologies has fostered thedevelopment of new kind of interfaces for musical ex-pression. Graphical User Interfaces, tangible interfaces,gesture interfaces have now become omnipresent in thedesign of live music performance or compositions [24].Most of these interfaces are gesture-based interfaces thatrequire explicit conscious body movements from the user.They can give access to behavioral or self-reported infor-mation, but not to implicit emotional states of the user.

2.3.2. Implicit Biosignal Interface

Thanks to the development of more robust and accuratebiosignal technologies, it is now possible to derive emotion-related information from physiological data and use it asan input to interactive music systems. Although the ideais not new [15, 30], the past few years have witnessed agrowing interest from the computer music community inusing physiological data such as heart rate, electrodermalactivity, electroencephalogram and respiration to generateor transform sound and music. Providing emotion-basedphysiological interface is highly relevant for a number ofapplications including music therapy, diagnosis, interac-tive gaming, and emotion-aware musical instruments.

2.3.3. Emotional Mapping

Music and its effect on the listener has long been a subjectof fascination and scientific exploration from the Greeksspeculating on the acoustic properties of the voice [14]toMusak researcher designing soothing elevator music. Ithas now become an omnipresent part of our day to daylife. Music is well known for affecting human emotionalstates, and most people enjoy music because of the emo-tions it evokes. Yet, although emotions seem to be a cru-cial aspect of music listening and performance, the scien-tific literature on music and emotion is scarce if comparedto music cognition or perception [21, 10, 17, 5]. The re-lationship between specific musical parameters and time-varying emotional responses is still not clear. Biofeedbackinteractive music systems appear to be an ideal paradigmto explore the complex relationship between emotion andmusic.

2.4. Perceptual and Cognitive Models of Musical Rep-resentation

What are the most relevant dimensions of music and howshould they be represented in the system ? Here, we take acognitive psychology approach, and define a set of param-eters that are the most salient perceptually and the mostmeaningful cognitively. Music is a real-world stimulusthat is meant to be appreciated by a human listener. It in-volves a complex set of perceptive and cognitive processes

Figure 2. The different levels of sequential groupingfor musical material: event fusion, melodic and rhythmicgrouping and formal sectioning (from [35])

that take place in the central nervous system. The fast ad-vances in the neuroscience of music over the past twentyyears have taught us that these processes are partly inter-dependent, are integrated in time and involve memory aswell as emotional systems [16, 25, 26]. Their study shedlight on the structures and features that are involved inmusic processing and stand out as being perceptually andcognitively relevant. Experimental studies have found thatmusical perception happens at three different time scale,namely the event fusion level when basic musical eventssuch as pitch, intensity and timbre emerge ( 50ms); themelodic and rhythmic grouping when pattern of those ba-sic events are perceived ( 5s), and finally the form level(from 5s to 1 hour) that deals with large scale sections ofmusic [35] (Figure 2). This hierarchy of three time scaleof music processing forms the basis on which we builtSMuSes music processing chain.

2.5. Music Processing Modules

Research on the brain substrates underlying music pro-cessing has switched in the last twenty years from a clas-sical view emphasizing a dichotomy between language(supposedly processed in left hemisphere) and music (re-spectively right hemisphere) to a modular view [1]. Thereis some evidence that music processing modules are orga-nized into two parallel but largely independent submod-ules that deal with pitch content (What ?) and tempo-ral content (When ?) respectively [26, 18]. This ev-idence suggests that they can be treated separately in acomputational framework. Additionally, studies involv-ing music-related deficits in neurologically impaired indi-viduals (e.g. subjects with amusias who cant recognizemelodies anymore) have shown that music faculty is com-posed of a set of neurally isolable processing componentsfor pitch, loudness and rhythm [25]. The common view isthat pitch, rhythm and loudness are first processed sepa-

Voice n

When

What

Instrument n

Room, Location

Musical Event

Acoustic Event

Pulse

Rhythm Agent

Pitch Classes/Chord Agent

RegisterAgent

DynamicsAgent

ArticulationAgent

Attack Brightness Flux Noisiness Inharmonicity

Reverb Spatialization

Voice 1

Instrument 1

Global Reference Expressive Modulation

Memory

Figure 3. SMuSE is based on a hierarchy of musicalagents

rately by the brain to then later form (around 25-50ms) animpression of unified musical object [20] (see [16] for areview of the neural basis of music perception). This mod-ularity as well as the three different levels and time scalesof auditory memory (sound, groups, structure) form a setof basic principles for designing our bio-mimetic musicsystem.

3. A COMPUTATIONAL MODEL BASED ON ASOCIETY OF MUSICAL AGENTS

The architecture of SMuSe is inspired by neurological ev-idence. It follows a hierarchical and modular structure,and has been implemented as a set of agents using data-flow programming.

3.1. The SMuSes Architecture

SMuSe is built on a hierarchical, bio-mimetic and mod-ular architecture. The musical material is represented atthree different hierarchical levels, namely event fusion,event grouping and structure corresponding to differentmemory constraints. From the generative point of view,SMuSe modules are divided into time modules (when)that generate rhythmic pattern of events and content mod-ules (what) that for each time event choose musical ma-terial such as pitch and dynamics (Figure 3).

These cognitive and perceptual constraints influenced thedesign of the SMuSes architecture. At the low event fu-sion level, SMuSe provides a set of synthesis techniquesvalidated by psychoacoustic tests [2] that give perceptualcontrol over the generation of timbre as well as the use ofMIDI information to define basic musical material such as

_368 _369

pitch, velocity and duration. Inspired by previous workson musical performance modeling [7], the SMuSe also al-lows to modulate the expressiveness of music generationby varying parameters such as phrasing, articulation andperformance noise [2].

At the medium melodic and rhythmic grouping level, theSMuSe implements various state of the art algorithmiccomposition tools (e.g. generation of tonal, Brownian andserial series of pitches and rhythms, Markov chains, ...).The time scale of this mid-level of processing is in theorder of 5s. for a single grouping, i.e. the time limit ofauditory short-term memory.

The form level concerns large groupings of events overa long period of time (longer than the short-term mem-ory). It deals with entire sequences of music and relatesto the structure and limits of long-term memory. Influ-enced by experiments in synthetic epistemology and sit-uated robotics, this longer term structure is accomplishedvia the interaction with the environment[37, 38].

The modularity of the music processing chain is also re-flected in different SMuSe modules that specifically dealwith time (when) or material (what).

3.2. Agency

The agent framework is based on the principle that com-plex tasks can be accomplished through a society of sim-ple cross-connected self-contained agents [22]. Here, anagent is understood as anything that can be viewed asperceiving its environment through sensors and acting uponthat environment through effectors. [32]. In the con-text of cognitive science, this paradigm somehow takes astand against a unified theory of mind where a diversityof phenomena would be explained by a single set of rules.The claim here is that surprising, complex and emergentresults can be obtained through the interaction of sim-ple non-linear agents. The agent framework is particu-larly suited to building flexible real-time interactive mu-sical systems based on the principles of modularity, real-time interaction and situatedness.

3.3. Data-flow Programming

We chose to implement this hierarchy of musical agentsin SMuSe in a data-flow programming language calledMax/MSP [40]. Data flow programming conceptually mod-els a program as a directed graph of data flowing betweenoperations. This kind of model can easily represent par-allel processing which is common in biological systems,and is also convenient to represent an agent-based modu-lar architecture.

Interestingly enough, programming in Max/MSP encour-ages the programmer to think in a way that is close to howa brain might works. Firstly, since Max/MSP is basedon a data-flow paradigm, processes can operate in parallel

(e.g. pitch and rhythm processes). Secondly, thanks to theconcept of patch abstraction (a Max meta-patch that ab-stracts or include another Max patch), one is able to easilybuild several layers of processing units (which is some-how similar to the different layers of the cortex). Finally,each process can be connected to every other in a vari-ety of ways (like neurons). Of course, the comparison toneural processes is limited to higher level, organizationalprocesses.

3.4. Distributed Control

All the musical agents in SMuSe are OSC-compatible [39]which means they can be controlled and accessed fromanywhere (including over a network) at any time. Thisgives great flexibility to the system, and allows for sharedcollaborative compositions where several clients accessand modulate the music server. In this collaborative com-position paradigm, every performer builds on what theothers have done. The result is a complex sound struc-ture that keeps evolving as long as different performerscontribute changes to its current shape. A parallel couldbe drawn with stigmergic mechanisms of coordination be-tween social insects like ants [34, 3, 11] . In ants colonies,the pheromonal trace left by one ant at a given time isused as a means to communicate and stimulate the actionof the others. Hence they manage to collectively buildcomplex networks of trails towards food sources. Simi-larly, in a collective music paradigm powered by an OSCclient/server architecture, one performer leaves a musicaltrace to the shared composition, which in turn stimulatethe other co-performers to react and build on top of it.

3.5. Concurrent andOn-the-fly Control ofMusical Pro-cesses

We have proposed a biologically inspired memory andprocess architecture for SMuSe as well as a computationalmodel based on software agents. The OSC communica-tion protocol allows to easily send text-based commandsto specific agents in the hierarchy. It allows for flexibleand intuitive time-based, concurrent and on-the-fly con-trol of musical processes.

The different musical agents in the SMuSe all have a spe-cific ID/address where to receive commands and data. Theaddresses are divided into /global (affecting the whole hi-erarchy), /voice[n] (affecting specific voices), and /synth[n](affecting specific sound generators). The OSC syntaxsupports regular expressions which allows to address sev-eral modules at the same time with a compact syntax.

Patterns of perceptually-grounded musical features are sentto the short-term (STM) and long-term memory (LTM)modules at any moment in time via specific commands.

/* Example: fill up the STM *//voice1/rhythm/pattern 4n 4n 8n

16n 16n 4n/voice1/pitch/pattern 0 0 5 7 10 0/voice1/pitch/register 4 5 4 4 4 5/voice1/velocity 12 12 16 16 12 32

Musical sequence generation follow different Selection prin-ciples, a term inspired by the reflexions of Koenig on se-rial music and algorithmic composition [19]. It refers tothe actions taken by the system to generate musical eventsusing the available short and long term memory content.These actions can be deterministic (e.g. playback of astored sequence) or based on probability of occurrenceof specific events (series, Markov chains, random events).This allows for an hybrid approach to algorithmic compo-sition where complex stochastic processes are mixed withmore deterministic repeating patterns (Cf. Table 1). Ex-pressivity parameters such as articulation, tempo and dy-namics can be continuously accessed and modulated.

SelectionPrinci-ple

Description

Sequence The order of selection follows the initialsequence. (The sequence in memory is

played back).Inverse The elements of the original are selected

starting from the end. (The sequence inmemory is played backward).

Markov The elements of a sequence are chosenbased on state transition probabilities

Series Uniform random choice between elementsof the pattern without repetition. If anelement of the pattern has already been

selected, it cant be selected again until allthe other elements have been selected.

Aleatory Elements of the sequence are chosenrandomly.

Table 1. The action selection agents choose to play spe-cific musical elements stored in the current working STMfollowing deterministic or stochastic selection principles.

3.6. Human in the loop

We tested the SMuSe within different sensing environ-ments ranging from physiology sensors that can provideimplicit emotional user interaction (heart rate, electroder-mal activity, electroencephalogram), to virtual and mixed-reality sensors for behavioral interaction (camera, gazers,lasers, pressure sensitive floors) and finally MIDI and au-dio (microphone) for direct musical interaction (Figure 4).SMuSe integrates sensory data from the environment (that

The SMuSe

Microphone

Biosignals

Wiimote

XIM

IQR IanniX

Terminal Torque

Audio Feedback

Network Clients Sensor Data

OSC

Environment

Figure 4. SMuSes environment: the SMuSe can inter-act with its environment through different sensors such asbiosignals, camera, gazers, lasers, pressure sensitive floor,MIDI, audio, but also via OSC commands sent from clientapplications (such as console terminal, IQR, Iannix graph-ical score, Torque game engine, etc.) to the music serverover the network.

conveys information about the human participants inter-acting with the system) in real time and send this inter-preted data to the music generation processes after ap-propriate fixed or learned musical mappings. The initialmusical material generated by SMuSe is amplified, trans-formed and nuanced as the interaction between the systemand the participant evolves.

3.7. Emotional Mappings

We have built a situated cognitive music system that issensitive to its environment via musical, behavioral andphysiological sensors. Thanks to a flexible architecture,the system is able to memorize, combine and generatecomplex musical structures in real-time. We have de-scribed various sensate environments that provide feed-back information from the human interactor and allow toclose the interaction loop. As proposed previously, wetake an approach that focuses on emotion-related feed-back.

3.7.1. Predefined Mappings

From advanced behavioral and physiological interfaces,we can infer emotion-related information about the inter-action with SMuSe. One possible way to take this feed-back into account is to design fixed mappings based onprevious results from experimental psychology studies thathave investigated the emotional responses to specific mu-sical parameters. This a priori knowledge can be used todrive the choice of musical parameters depending on thedifference between the goal emotion to be expressed or in-duced, and the emotional state detected by the system viaits sensors. A number of reviews have proposed genericrelationships between sound parameters and emotional re-sponses [9, 13, 12]. These results serve as the basis forexplicit emotional mappings (Cf. Table 2) and have beenconfirmed in the specific context of SMuSes parameterspace. This explicit design approach is detailed in [2].

_368 _369

pitch, velocity and duration. Inspired by previous workson musical performance modeling [7], the SMuSe also al-lows to modulate the expressiveness of music generationby varying parameters such as phrasing, articulation andperformance noise [2].

At the medium melodic and rhythmic grouping level, theSMuSe implements various state of the art algorithmiccomposition tools (e.g. generation of tonal, Brownian andserial series of pitches and rhythms, Markov chains, ...).The time scale of this mid-level of processing is in theorder of 5s. for a single grouping, i.e. the time limit ofauditory short-term memory.

The form level concerns large groupings of events overa long period of time (longer than the short-term mem-ory). It deals with entire sequences of music and relatesto the structure and limits of long-term memory. Influ-enced by experiments in synthetic epistemology and sit-uated robotics, this longer term structure is accomplishedvia the interaction with the environment[37, 38].

The modularity of the music processing chain is also re-flected in different SMuSe modules that specifically dealwith time (when) or material (what).

3.2. Agency

The agent framework is based on the principle that com-plex tasks can be accomplished through a society of sim-ple cross-connected self-contained agents [22]. Here, anagent is understood as anything that can be viewed asperceiving its environment through sensors and acting uponthat environment through effectors. [32]. In the con-text of cognitive science, this paradigm somehow takes astand against a unified theory of mind where a diversityof phenomena would be explained by a single set of rules.The claim here is that surprising, complex and emergentresults can be obtained through the interaction of sim-ple non-linear agents. The agent framework is particu-larly suited to building flexible real-time interactive mu-sical systems based on the principles of modularity, real-time interaction and situatedness.

3.3. Data-flow Programming

We chose to implement this hierarchy of musical agentsin SMuSe in a data-flow programming language calledMax/MSP [40]. Data flow programming conceptually mod-els a program as a directed graph of data flowing betweenoperations. This kind of model can easily represent par-allel processing which is common in biological systems,and is also convenient to represent an agent-based modu-lar architecture.

Interestingly enough, programming in Max/MSP encour-ages the programmer to think in a way that is close to howa brain might works. Firstly, since Max/MSP is basedon a data-flow paradigm, processes can operate in parallel

(e.g. pitch and rhythm processes). Secondly, thanks to theconcept of patch abstraction (a Max meta-patch that ab-stracts or include another Max patch), one is able to easilybuild several layers of processing units (which is some-how similar to the different layers of the cortex). Finally,each process can be connected to every other in a vari-ety of ways (like neurons). Of course, the comparison toneural processes is limited to higher level, organizationalprocesses.

3.4. Distributed Control

All the musical agents in SMuSe are OSC-compatible [39]which means they can be controlled and accessed fromanywhere (including over a network) at any time. Thisgives great flexibility to the system, and allows for sharedcollaborative compositions where several clients accessand modulate the music server. In this collaborative com-position paradigm, every performer builds on what theothers have done. The result is a complex sound struc-ture that keeps evolving as long as different performerscontribute changes to its current shape. A parallel couldbe drawn with stigmergic mechanisms of coordination be-tween social insects like ants [34, 3, 11] . In ants colonies,the pheromonal trace left by one ant at a given time isused as a means to communicate and stimulate the actionof the others. Hence they manage to collectively buildcomplex networks of trails towards food sources. Simi-larly, in a collective music paradigm powered by an OSCclient/server architecture, one performer leaves a musicaltrace to the shared composition, which in turn stimulatethe other co-performers to react and build on top of it.

3.5. Concurrent andOn-the-fly Control ofMusical Pro-cesses

We have proposed a biologically inspired memory andprocess architecture for SMuSe as well as a computationalmodel based on software agents. The OSC communica-tion protocol allows to easily send text-based commandsto specific agents in the hierarchy. It allows for flexibleand intuitive time-based, concurrent and on-the-fly con-trol of musical processes.

The different musical agents in the SMuSe all have a spe-cific ID/address where to receive commands and data. Theaddresses are divided into /global (affecting the whole hi-erarchy), /voice[n] (affecting specific voices), and /synth[n](affecting specific sound generators). The OSC syntaxsupports regular expressions which allows to address sev-eral modules at the same time with a compact syntax.

Patterns of perceptually-grounded musical features are sentto the short-term (STM) and long-term memory (LTM)modules at any moment in time via specific commands.

/* Example: fill up the STM *//voice1/rhythm/pattern 4n 4n 8n

16n 16n 4n/voice1/pitch/pattern 0 0 5 7 10 0/voice1/pitch/register 4 5 4 4 4 5/voice1/velocity 12 12 16 16 12 32

Musical sequence generation follow different Selection prin-ciples, a term inspired by the reflexions of Koenig on se-rial music and algorithmic composition [19]. It refers tothe actions taken by the system to generate musical eventsusing the available short and long term memory content.These actions can be deterministic (e.g. playback of astored sequence) or based on probability of occurrenceof specific events (series, Markov chains, random events).This allows for an hybrid approach to algorithmic compo-sition where complex stochastic processes are mixed withmore deterministic repeating patterns (Cf. Table 1). Ex-pressivity parameters such as articulation, tempo and dy-namics can be continuously accessed and modulated.

SelectionPrinci-ple

Description

Sequence The order of selection follows the initialsequence. (The sequence in memory is

played back).Inverse The elements of the original are selected

starting from the end. (The sequence inmemory is played backward).

Markov The elements of a sequence are chosenbased on state transition probabilities

Series Uniform random choice between elementsof the pattern without repetition. If anelement of the pattern has already been

selected, it cant be selected again until allthe other elements have been selected.

Aleatory Elements of the sequence are chosenrandomly.

Table 1. The action selection agents choose to play spe-cific musical elements stored in the current working STMfollowing deterministic or stochastic selection principles.

3.6. Human in the loop

We tested the SMuSe within different sensing environ-ments ranging from physiology sensors that can provideimplicit emotional user interaction (heart rate, electroder-mal activity, electroencephalogram), to virtual and mixed-reality sensors for behavioral interaction (camera, gazers,lasers, pressure sensitive floors) and finally MIDI and au-dio (microphone) for direct musical interaction (Figure 4).SMuSe integrates sensory data from the environment (that

The SMuSe

Microphone

Biosignals

Wiimote

XIM

IQR IanniX

Terminal Torque

Audio Feedback

Network Clients Sensor Data

OSC

Environment

Figure 4. SMuSes environment: the SMuSe can inter-act with its environment through different sensors such asbiosignals, camera, gazers, lasers, pressure sensitive floor,MIDI, audio, but also via OSC commands sent from clientapplications (such as console terminal, IQR, Iannix graph-ical score, Torque game engine, etc.) to the music serverover the network.

conveys information about the human participants inter-acting with the system) in real time and send this inter-preted data to the music generation processes after ap-propriate fixed or learned musical mappings. The initialmusical material generated by SMuSe is amplified, trans-formed and nuanced as the interaction between the systemand the participant evolves.

3.7. Emotional Mappings

We have built a situated cognitive music system that issensitive to its environment via musical, behavioral andphysiological sensors. Thanks to a flexible architecture,the system is able to memorize, combine and generatecomplex musical structures in real-time. We have de-scribed various sensate environments that provide feed-back information from the human interactor and allow toclose the interaction loop. As proposed previously, wetake an approach that focuses on emotion-related feed-back.

3.7.1. Predefined Mappings

From advanced behavioral and physiological interfaces,we can infer emotion-related information about the inter-action with SMuSe. One possible way to take this feed-back into account is to design fixed mappings based onprevious results from experimental psychology studies thathave investigated the emotional responses to specific mu-sical parameters. This a priori knowledge can be used todrive the choice of musical parameters depending on thedifference between the goal emotion to be expressed or in-duced, and the emotional state detected by the system viaits sensors. A number of reviews have proposed genericrelationships between sound parameters and emotional re-sponses [9, 13, 12]. These results serve as the basis forexplicit emotional mappings (Cf. Table 2) and have beenconfirmed in the specific context of SMuSes parameterspace. This explicit design approach is detailed in [2].

_370 _371

Table 2. The relationship between musical parametersand perceived emotion. A summary of the main results.(adapted from [10])

3.7.2. Adaptive Mappings

In cases where the relationship between the change ofmusical parameters and the emotional response has to belearned, explicit mappings is not possible. One solutionin SMuSe is to use a Reinforcement Learning (RL) agentthat learns to adapt the choice of musical parameters basedon the interaction of the systemwith the environment [36].Reinforcement learning is a biologically plausible learn-ing algorithm particularly suited to an explorative and adap-tive approach to mapping as it tries to find a sequenceof parameter changes that optimizes a reward function(which for instance relates to a state of emotional stress)[2]. Unlike most previous examples of the use reinforce-ment learning in music systems, where the reward relatesto some predefined musical rules or quality of improvisa-tion, we are interested in the emotional feedback from thelistener (Figure 5).

This approach contrasts with expert systems such as theKTH rule system [7, 8] that can modulate the expressiv-ity of music by applying a set of predefined rules inferredfrom previous extensive music and performance analysis.Here, we propose a paradigm where the system learns toautonomously tune its own parameters in function of thedesired reward function (some emotional feedback) with-out using any a-priori musical rule.

Interestingly enough, the biological validity of RL is sup-ported by numerous studies in psychology and neurosciencethat found various examples of reinforcement learning inanimal behavior (e.g. foraging behavior of bees [23] or

Participant

SMuSe

Reinforcement Agent

Audio Feedback

Reward

Action

Figure 5. The reinforcement-based music system iscomposed of three main components: the music engine(the SMuSe), the reinforcement learning agent and the lis-tener who provides the reward signal.

the dopamine system in primate brains [33]).

4. ARTISTIC REALIZATIONS

We have further explored the purposive construction of in-teractive installations and performances using the SMuSesystem. To name but a few, during the VRoboser instal-lation [2], the sensory inputs (motion, color, distance) ofa 3D virtual Khepera1 robot living in a game-like envi-ronment modulated musical parameters in real-time, thuscreating a never ending musical soundscape in the spiritof Brain Enos Music for Airports. In another contextthe SMuSe generated automatic soundscapes and musicwhich reacted to and influenced the spatial behavior ofhuman and avatars in the mixed-reality space called XIM(for eXperience Induction Machine) [2] thus emphasizingthe role of the environment and interaction on the musicalcomposition. Based on similar premises, Re(PER)curso,an interactive mixed reality performance involving dance,percussion, interactive music and video was presented atthe ArtFutura Festival 07 and Museum of Modern Art inBarcelona in the same year. The performance was com-posed by several interlaced layers of artistic and techno-logical activities. The music controlled had three com-ponents: a predefined soundscape, the percussionist whoperformed from a score and the interactive compositionsystem synchronized by SMuSe; the physical actors, thepercussionist and the dancer were tracked by a video basedactive tracking system that in turn controlled an array ofmoving lights that illuminated the scene. The spatial in-formation from the stage obtained by the tracking systemwas also projected onto the virtual world where it modu-lated the avatars behavior allowing it to adjust body posi-tion, posture and gaze to the physical world. In 2009, theBrain Orchestra, a multimodal performance using braincomputer interfaces, explored the creative potential of acollection of brains directly interfaced to the world. Dur-ing the performance, four brain musicians were control-ling a string quartet generated by the SMuSe using theirbrain activity alone. The orchestra was conducted by anemotional conductor, whose emotional reactions wererecorded using biosignal interfaces and fed back to the

1http://www.k-team.com/

A) B)

C)

Figure 6. Artistic realizations: A) Re(PER)curso (ArtFutura and MACBA, Barcelona 2007) B) TheMultimodalBrain Orchestra (FET, Prague 2009) C) XIM sonification(2007)

system. The Brain Orchestra was premiered in Prague forthe FET 09 meeting organized by the European Commis-sion. [2]. Finally, a live performance of a piece inspiredby Terry Rileys in C served as an illustration of theprinciples of parallelism, situatedness and emergence ex-hibited by the SMuSe at the Ernst Strungmann Forum onLanguage, Music and the Brain: a mysterious relation-ship.

5. CONCLUSIONS

The SMuSe illustrates a novel situated approach to musiccomposition systems. It is built on a cognitively plausi-ble architecture that takes into account the different timeframes of music processing, and uses an agent frameworkto model a society of simple distributed musical processes.It takes advantage of its interaction with the environmentto go beyond the classic sense-think-act paradigm [31].It combines cognitively relevant representations with per-ceptually grounded sound synthesis techniques and is basedon modern data-flow audio programming practices [28,29]. This provides an intuitive, flexible and distributedcontrol environment that can easily generate complex mu-sical structure in real-time. SMuSe can sense its environ-ment via a variety of sensors, notably physiology-basedsensors. The analysis and extraction of relevant informa-tion from sensor data allows to re-inject emotion-basedfeedback to the system based on the responses of the hu-man participant. The SMuSe proposes a set of pre-wiredemotional mappings from emotions to musical parametersgrounded on the literature on music and emotion, as wellas a reinforcement learning agent that performs onlineadaptive mapping. It provides a well grounded approachtowards the development of advanced synthetic aestheticsystems and a further understanding of the fundamentalpsychological processes on which it relies.

6. REFERENCES

[1] E. Altenmuller, How many music centers are in thebrain? in Annals of the New York Academy of Sci-ences, vol. 930, no. The Biological Foundations ofMusic. John Wiley & Sons, 2001, pp. 273280.

[2] Anonymous, For blind review purpose.

[3] E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarmintelligence: from natural to artificial systems. Ox-ford University Press, USA, 1999.

[4] B. Bongers, Physical interfaces in the electronicarts. Interaction theory and interfacing techniquesfor real-time performance, Trends in Gestural Con-trol of Music, pp. 4170, 2000.

[5] M. M. Bradley and P. J. Lang, Affective reactions toacoustic stimuli. Psychophysiology, vol. 37, no. 2,pp. 204215, March 2000.

[6] J. Fodor, The language of thought. Harvard UnivPr, 1975.

[7] A. Friberg, R. Bresin, and J. Sundberg, Overviewof the kth rule system for musical performance, Ad-vances in Cognitive Psychology, Special Issue onMusic Performance, vol. 2, no. 2-3, pp. 145161,2006.

[8] A. Friberg, pdm: An expressive sequencer withreal-time control of the kth music-performancerules, Comput. Music J., vol. 30, no. 1, pp. 3748,2006.

[9] A. Gabrielson and P. N. Juslin, Emotional expres-sion in music performance: Between the performersintention and the listeners experience, in Psychol-ogy of Music, vol. 24, no. 1, 1996, pp. 6891.

[10] A. Gabrielsson and E. Lindstrm, Music and Emo-tion - Theory and Research, ser. Series in AffectiveScience. NewYork: Oxford University Press, 2001,ch. The Influence of Musical Structure on EmotionalExpression.

[11] E. Hutchins and G. Lintern, Cognition in the Wild.MIT Press, Cambridge, Mass., 1995.

[12] P. N. Juslin and P. Laukka, Communication of emo-tions in vocal expression and music performance:different channels, same code? Psychol Bull, vol.129, no. 5, pp. 770814, Sep 2003.

[13] P. N. Juslin and J. A. Sloboda, Eds.,Music and emo-tion : theory and research. Oxford ; New York:Oxford University Press, 2001.

[14] P. Kivy, Introduction to a Philosophy of Music. Ox-ford University Press, USA, 2002.