8
_365 THE SMUSE: AN EMBODIED COGNITION APPROACH TO INTERACTIVE MUSIC COMPOSITION Name of author Address - Line 1 Address - Line 2 Address - Line 3 ABSTRACT The evolution of computer-based music systems has gone from computer-aided composition, which transposed the traditional paradigms of music composition to the digital realm, to complex feedback systems that allow for rich multimodal interactions. Yet, a lot of interactive music systems still rely on outdated principles in the light of modern situated cognitive systems design. Moreover, the role of human emotional feedback, arguably an important feature of musical experience, is rarely taken into account into the interaction loop. We propose to address these lim- itations by introducing a novel situated synthetic interac- tive composition system called the SMuSe (for Situated Music Server). The SMuSe is based on the principles of parallelism, situatedness, emergence and emotional feed- back and is built on a cognitively plausible architecture. It allows to address questions at the intersection of music perception and cognition while being used as a creative tool for interactive music composition. 1. BACKGROUND Interactivity has now become a standard feature of many multimedia systems and plays a fundamental role in con- temporary art practice. Specifically, real-time human/machine interactive music systems are now omnipresent as both composition and live performance tools. Yet, the term “interactive music system” is often misused. The inter- action that takes place between a human and a system is a process that includes both control and feedback, where the real-world actions are interpreted into the virtual domain of the system [4]. If some parts of the interaction loop are missing (for instance the cognitive level in Figure 1), the system becomes only a reactive (vs. interactive) system. As a matter of fact, in most of current human-computer musical systems, the human agent interacts whereas the machine due to a lack of cognitive modeling only reacts. Although the term interactivity is widely used in the new media arts, most systems are simply reactive systems [4]. Furthermore, the cognitive modeling of interactive multi- media systems, when it exists, often relies on a classical cognitive science approach to artificial systems where the different modules (e.g. perception, memory, action) are studied separately. This approach has since been chal- lenged by modern cognitive science, which emphasizes the crucial role of the perception-action loop, the build- ing of cognitive artifacts, as well as the interaction of the system with its environment [37]. In this paper we pro- pose a novel approach to interactive music system design informed by modern cognitive science and present an im- plementation of such a system called the SMuSe. Sensors Actuators Memory Cognition Interaction Machine Human Senses Effectors Memory Cognition Figure 1. Human machine interaction (adapted from [4]) 2. FROM EMBODIED COGNITIVE SCIENCE TO MUSIC SYSTEMS DESIGN 2.1. Classical View A look at the evolution of our understanding of cognitive systems put in parallel with the evolution of music com- position practices, gives a particularly interesting perspec- tive on some limitations of actual interactive music sys- tems. The classical approach to cognitive science assumes that external behavior is mediated by internal representations [6] and that cognition is basically the manipulation of these mental representations by sets of rules. It mainly relies on the sense-think-act framework [27], where future actions are planned according to perceptual information. Interestingly enough, a parallel can be drawn between clas- sical cognitive science and the development of classical music which also heavily relies on the use of formal struc- tures. It puts the emphasis on internal processes (compo- sition theory) to the detriment of the environment or the body, with a centralized control of the performance (the conductor).

Smuse an Embodied Cognition Approach to Interactive Music

  • Upload
    ipires

  • View
    22

  • Download
    1

Embed Size (px)

DESCRIPTION

Smuse an Embodied Cognition Approach to Interactive Music

Citation preview

  • _364 _365

    9. CONCLUSIONS

    It is difficult to evaluate the role that CAC tools play inthe compositional process; indeed, the influence of eventhe most inert music notation software on compositionalthinking is difficult to deny [2, 3, 4, 22]. ManuScoreexpands the field of CAC tools by augmenting commonnotation-based approaches with a more open conceptualdesign, and with the inclusion of corpus-based, generativecapabilities. Although further validation of ManuScore isrequired, the user and listener studies outlined in this pa-per suggest that our goal of providing an interactive CACtool, which enhances the compositional process, withoutdisrupting the composers musical language, has been atleast provisionally achieved.

    10. ACKNOWLEDGEMENTS

    This work was made possible in part by the Social Sci-ences and Humanities Research Council, the Canada Coun-cil for the Arts, and the Natural Sciences and EngineeringResearch Council of Canada.

    11. REFERENCES

    [1] G. Assayag, C. Rueda, M. Laurson, C. Agon, andO. Delerue, Computer-assisted composition at IR-CAM: from Patchwork to Openmusic, ComputerMusic Journal, vol. 23, no. 3, pp. 5972, 1999.

    [2] A. Brown, Modes of compositional engagement,Mikropolyphony, vol. 6, 2001b.

    [3] D. Collins, A synthesis process model of creativethinking in music composition, Psychology of Mu-sic, vol. 33, no. 2, p. 193, 2005.

    [4] , Real-time tracking of the creative musiccomposition process, Digital Creativity, vol. 18,no. 4, pp. 239256, 2007.

    [5] D. Cope, The Composers Underscoring Environ-ment: CUE, Computer Music Journal, vol. 21,no. 3, pp. 2037, 1997.

    [6] O. Laske, Composition Theory: An enrichmentof music theory, Journal of New Music Research,vol. 18, no. 1, pp. 4559, 1989.

    [7] , The computer as the artists alter ego,Leonardo, pp. 5366, 1990.

    [8] , Composition Theory in Koenigs Project Oneand Project Two, Computer Music Journal, vol. 5,no. 4, pp. 5465, 1981.

    [9] , Toward an Epistemology of Composition,Interface, vol. 20, pp. 235269, 1991.

    [10] M. Laurson, M. Kuuskankare, and V. Norilo, Anoverview of PWGL, a visual programming environ-ment for music, Computer Music Journal, vol. 33,no. 1, pp. 1931, 2009.

    [11] J. Maxwell, P. Pasquier, and A. Eigenfeldt, Theclosure-based cueing model: Cognitively-inspiredlearning and generation of musical sequences, inProceedings of the 2011 Sound and Music Comput-ing Conference, 2011.

    [12] J. Maxwell, P. Pasquier, A. Eigenfeldt, andN. Thomas, MusiCOG: A cognitive architecture formusic learning and generation, in Proceedings ofthe 2012 Sound and Music Computing Conference,2012.

    [13] J. McCormack, P. McIlwain, A. Lane, and A. Dorin,Generative composition with Nodal, in Workshopon Music and Artificial Life (part of ECAL 2007),Lisbon, Portugal. Citeseer, 2007.

    [14] T. Potter, All my children: A portrait of Sir AndrzejPanufnik based on conversations with Tully Potter,The Musical Times, vol. 132, no. 1778, pp. 186191,1991.

    [15] C. Scaletti and R. Johnson, An interactive envi-ronment for object-oriented music composition andsound synthesis, ACM SIGPLAN Notices, vol. 23,no. 11, pp. 222233, 1988.

    [16] I. Stravinsky, T. Palmer, N. Elliott, and M. Bragg,Once, at a border ...: Aspects of Stravinsky. KulturInternational Films, 1980.

    [17] H. Taube, Common Music: A music compositionlanguage in Common Lisp and CLOS, ComputerMusic Journal, vol. 15, no. 2, pp. 2132, 1991.

    [18] L. Tenney, J. Polansky, Temporal gestalt perceptionin music, Journal of Music Theory, vol. 24, no. 2,1980.

    [19] B. Truax, The PODX system: interactive composi-tional software for the DMX-1000, Computer Mu-sic Journal, pp. 2938, 1985.

    [20] J. Ventrella, Evolving structure in Liquid Music,The Art of Artificial Evolution, pp. 269288, 2008.

    [21] G. Wang, P. Cook et al., ChucK: A concurrent, on-the-fly audio programming language, in Proceed-ings of International Computer Music Conference,2003, pp. 219226.

    [22] J. Wiggins, Compositional process in music, Inter-national handbook of research in arts education, pp.453476, 2007.

    [23] D. Zicarelli, M and Jam Factory, Computer MusicJournal, vol. 11, no. 4, pp. 1329, 1987.

    THE SMUSE: AN EMBODIED COGNITION APPROACH TOINTERACTIVE MUSIC COMPOSITION

    Name of author

    Address - Line 1Address - Line 2Address - Line 3

    ABSTRACT

    The evolution of computer-based music systems has gonefrom computer-aided composition, which transposed thetraditional paradigms of music composition to the digitalrealm, to complex feedback systems that allow for richmultimodal interactions. Yet, a lot of interactive musicsystems still rely on outdated principles in the light ofmodern situated cognitive systems design. Moreover, therole of human emotional feedback, arguably an importantfeature of musical experience, is rarely taken into accountinto the interaction loop. We propose to address these lim-itations by introducing a novel situated synthetic interac-tive composition system called the SMuSe (for SituatedMusic Server). The SMuSe is based on the principles ofparallelism, situatedness, emergence and emotional feed-back and is built on a cognitively plausible architecture.It allows to address questions at the intersection of musicperception and cognition while being used as a creativetool for interactive music composition.

    1. BACKGROUND

    Interactivity has now become a standard feature of manymultimedia systems and plays a fundamental role in con-temporary art practice. Specifically, real-time human/machineinteractive music systems are now omnipresent as bothcomposition and live performance tools. Yet, the terminteractive music system is often misused. The inter-action that takes place between a human and a system is aprocess that includes both control and feedback, where thereal-world actions are interpreted into the virtual domainof the system [4]. If some parts of the interaction loop aremissing (for instance the cognitive level in Figure 1), thesystem becomes only a reactive (vs. interactive) system.

    As a matter of fact, in most of current human-computermusical systems, the human agent interacts whereas themachine due to a lack of cognitive modeling only reacts.Although the term interactivity is widely used in the newmedia arts, most systems are simply reactive systems [4].Furthermore, the cognitive modeling of interactive multi-media systems, when it exists, often relies on a classicalcognitive science approach to artificial systems where the

    different modules (e.g. perception, memory, action) arestudied separately. This approach has since been chal-lenged by modern cognitive science, which emphasizesthe crucial role of the perception-action loop, the build-ing of cognitive artifacts, as well as the interaction of thesystem with its environment [37]. In this paper we pro-pose a novel approach to interactive music system designinformed by modern cognitive science and present an im-plementation of such a system called the SMuSe.

    Sensors

    Actuators

    MemoryCognition

    Interaction MachineHuman

    Senses

    Effectors

    MemoryCognition

    Figure 1. Human machine interaction (adapted from [4])

    2. FROM EMBODIED COGNITIVE SCIENCE TOMUSIC SYSTEMS DESIGN

    2.1. Classical View

    A look at the evolution of our understanding of cognitivesystems put in parallel with the evolution of music com-position practices, gives a particularly interesting perspec-tive on some limitations of actual interactive music sys-tems.

    The classical approach to cognitive science assumes thatexternal behavior is mediated by internal representations[6] and that cognition is basically the manipulation of thesemental representations by sets of rules. It mainly relies onthe sense-think-act framework [27], where future actionsare planned according to perceptual information.

    Interestingly enough, a parallel can be drawn between clas-sical cognitive science and the development of classicalmusic which also heavily relies on the use of formal struc-tures. It puts the emphasis on internal processes (compo-sition theory) to the detriment of the environment or thebody, with a centralized control of the performance (theconductor).

  • _366 _367

    Disembodiment in classical music composition can be seenat several levels. Firstly, by training, the composer is usedto compose in his head and translate his mental represen-tations into an abstract musical representation: the score.Secondly, the score is traditionally interpreted live by theorchestras conductor who controls the main aspects ofthe musical interpretation, whereas the orchestra musi-cians themselves are left with a relatively reduced inter-pretative freedom. Moreover, the role of audience as anactive actor of a musical performance is mostly neglected.

    2.2. Modern View

    An alternative to classical cognitive science is the connec-tionist approach that tries to build biologically plausiblesystems using neural networks. Unlike more traditionaldigital computation models based on serial processing andexplicit manipulation of symbols, connectionist networksallow for fast parallel computation. Moreover, it does notrely on explicit rules but on emergent phenomena stem-ming from the interaction between simple neural units.Another related approach, called embodied cognitive sci-ence, put the emphasis on the influence of the environmenton internal processes. In some sense it replaced the viewof cognition as a representation by the view that cognitionis an active process involving an agent acting in the en-vironment. Consequently, the complexity of a generatedstructure is not the result of the complexity of the under-lying system only, but partly due to the complexity of itsenvironment [34].

    A piece that gives a good illustration of situatedness, dis-tributed processing, and emergence principles is In C byTerry Riley. In this piece, musicians are given a set ofpitch sequences composed in advance, but each musicianis left in charge of choosing when to start playing and re-peating these sequences. The piece is formed by the com-bination of decisions of each independent musician thatmakes her decision based on the collective musical outputthat emerges from all the possible variations.

    Following recent evolution of our understanding of cogni-tive systems, we emphasize the crucial role of emergence,distributed processes and situatedness (as opposed to rule-based, serial, central, internal models) in the design of in-teractive music composition systems.

    2.3. Human-In-The-Loop

    With the advent of new sensor technologies, the influenceof the environment on music systems can be sensed viaboth explicit and implicit interfaces, which allow accessto the behavioral and physiological state of the user. Mu-sic is often referred to as the language of emotion, hencehuman emotions seems to be a natural feedback channelto take into account in the design of a situated music sys-tem. We believe that in order to be complete, the designof a situated music system should take into considerationthe emotional aspects of music.

    2.3.1. Explicit Gestural Interfaces

    The advent of new sensing technologies has fostered thedevelopment of new kind of interfaces for musical ex-pression. Graphical User Interfaces, tangible interfaces,gesture interfaces have now become omnipresent in thedesign of live music performance or compositions [24].Most of these interfaces are gesture-based interfaces thatrequire explicit conscious body movements from the user.They can give access to behavioral or self-reported infor-mation, but not to implicit emotional states of the user.

    2.3.2. Implicit Biosignal Interface

    Thanks to the development of more robust and accuratebiosignal technologies, it is now possible to derive emotion-related information from physiological data and use it asan input to interactive music systems. Although the ideais not new [15, 30], the past few years have witnessed agrowing interest from the computer music community inusing physiological data such as heart rate, electrodermalactivity, electroencephalogram and respiration to generateor transform sound and music. Providing emotion-basedphysiological interface is highly relevant for a number ofapplications including music therapy, diagnosis, interac-tive gaming, and emotion-aware musical instruments.

    2.3.3. Emotional Mapping

    Music and its effect on the listener has long been a subjectof fascination and scientific exploration from the Greeksspeculating on the acoustic properties of the voice [14]toMusak researcher designing soothing elevator music. Ithas now become an omnipresent part of our day to daylife. Music is well known for affecting human emotionalstates, and most people enjoy music because of the emo-tions it evokes. Yet, although emotions seem to be a cru-cial aspect of music listening and performance, the scien-tific literature on music and emotion is scarce if comparedto music cognition or perception [21, 10, 17, 5]. The re-lationship between specific musical parameters and time-varying emotional responses is still not clear. Biofeedbackinteractive music systems appear to be an ideal paradigmto explore the complex relationship between emotion andmusic.

    2.4. Perceptual and Cognitive Models of Musical Rep-resentation

    What are the most relevant dimensions of music and howshould they be represented in the system ? Here, we take acognitive psychology approach, and define a set of param-eters that are the most salient perceptually and the mostmeaningful cognitively. Music is a real-world stimulusthat is meant to be appreciated by a human listener. It in-volves a complex set of perceptive and cognitive processes

    Figure 2. The different levels of sequential groupingfor musical material: event fusion, melodic and rhythmicgrouping and formal sectioning (from [35])

    that take place in the central nervous system. The fast ad-vances in the neuroscience of music over the past twentyyears have taught us that these processes are partly inter-dependent, are integrated in time and involve memory aswell as emotional systems [16, 25, 26]. Their study shedlight on the structures and features that are involved inmusic processing and stand out as being perceptually andcognitively relevant. Experimental studies have found thatmusical perception happens at three different time scale,namely the event fusion level when basic musical eventssuch as pitch, intensity and timbre emerge ( 50ms); themelodic and rhythmic grouping when pattern of those ba-sic events are perceived ( 5s), and finally the form level(from 5s to 1 hour) that deals with large scale sections ofmusic [35] (Figure 2). This hierarchy of three time scaleof music processing forms the basis on which we builtSMuSes music processing chain.

    2.5. Music Processing Modules

    Research on the brain substrates underlying music pro-cessing has switched in the last twenty years from a clas-sical view emphasizing a dichotomy between language(supposedly processed in left hemisphere) and music (re-spectively right hemisphere) to a modular view [1]. Thereis some evidence that music processing modules are orga-nized into two parallel but largely independent submod-ules that deal with pitch content (What ?) and tempo-ral content (When ?) respectively [26, 18]. This ev-idence suggests that they can be treated separately in acomputational framework. Additionally, studies involv-ing music-related deficits in neurologically impaired indi-viduals (e.g. subjects with amusias who cant recognizemelodies anymore) have shown that music faculty is com-posed of a set of neurally isolable processing componentsfor pitch, loudness and rhythm [25]. The common view isthat pitch, rhythm and loudness are first processed sepa-

    Voice n

    When

    What

    Instrument n

    Room, Location

    Musical Event

    Acoustic Event

    Pulse

    Rhythm Agent

    Pitch Classes/Chord Agent

    RegisterAgent

    DynamicsAgent

    ArticulationAgent

    Attack Brightness Flux Noisiness Inharmonicity

    Reverb Spatialization

    Voice 1

    Instrument 1

    Global Reference Expressive Modulation

    Memory

    Figure 3. SMuSE is based on a hierarchy of musicalagents

    rately by the brain to then later form (around 25-50ms) animpression of unified musical object [20] (see [16] for areview of the neural basis of music perception). This mod-ularity as well as the three different levels and time scalesof auditory memory (sound, groups, structure) form a setof basic principles for designing our bio-mimetic musicsystem.

    3. A COMPUTATIONAL MODEL BASED ON ASOCIETY OF MUSICAL AGENTS

    The architecture of SMuSe is inspired by neurological ev-idence. It follows a hierarchical and modular structure,and has been implemented as a set of agents using data-flow programming.

    3.1. The SMuSes Architecture

    SMuSe is built on a hierarchical, bio-mimetic and mod-ular architecture. The musical material is represented atthree different hierarchical levels, namely event fusion,event grouping and structure corresponding to differentmemory constraints. From the generative point of view,SMuSe modules are divided into time modules (when)that generate rhythmic pattern of events and content mod-ules (what) that for each time event choose musical ma-terial such as pitch and dynamics (Figure 3).

    These cognitive and perceptual constraints influenced thedesign of the SMuSes architecture. At the low event fu-sion level, SMuSe provides a set of synthesis techniquesvalidated by psychoacoustic tests [2] that give perceptualcontrol over the generation of timbre as well as the use ofMIDI information to define basic musical material such as

  • _366 _367

    Disembodiment in classical music composition can be seenat several levels. Firstly, by training, the composer is usedto compose in his head and translate his mental represen-tations into an abstract musical representation: the score.Secondly, the score is traditionally interpreted live by theorchestras conductor who controls the main aspects ofthe musical interpretation, whereas the orchestra musi-cians themselves are left with a relatively reduced inter-pretative freedom. Moreover, the role of audience as anactive actor of a musical performance is mostly neglected.

    2.2. Modern View

    An alternative to classical cognitive science is the connec-tionist approach that tries to build biologically plausiblesystems using neural networks. Unlike more traditionaldigital computation models based on serial processing andexplicit manipulation of symbols, connectionist networksallow for fast parallel computation. Moreover, it does notrely on explicit rules but on emergent phenomena stem-ming from the interaction between simple neural units.Another related approach, called embodied cognitive sci-ence, put the emphasis on the influence of the environmenton internal processes. In some sense it replaced the viewof cognition as a representation by the view that cognitionis an active process involving an agent acting in the en-vironment. Consequently, the complexity of a generatedstructure is not the result of the complexity of the under-lying system only, but partly due to the complexity of itsenvironment [34].

    A piece that gives a good illustration of situatedness, dis-tributed processing, and emergence principles is In C byTerry Riley. In this piece, musicians are given a set ofpitch sequences composed in advance, but each musicianis left in charge of choosing when to start playing and re-peating these sequences. The piece is formed by the com-bination of decisions of each independent musician thatmakes her decision based on the collective musical outputthat emerges from all the possible variations.

    Following recent evolution of our understanding of cogni-tive systems, we emphasize the crucial role of emergence,distributed processes and situatedness (as opposed to rule-based, serial, central, internal models) in the design of in-teractive music composition systems.

    2.3. Human-In-The-Loop

    With the advent of new sensor technologies, the influenceof the environment on music systems can be sensed viaboth explicit and implicit interfaces, which allow accessto the behavioral and physiological state of the user. Mu-sic is often referred to as the language of emotion, hencehuman emotions seems to be a natural feedback channelto take into account in the design of a situated music sys-tem. We believe that in order to be complete, the designof a situated music system should take into considerationthe emotional aspects of music.

    2.3.1. Explicit Gestural Interfaces

    The advent of new sensing technologies has fostered thedevelopment of new kind of interfaces for musical ex-pression. Graphical User Interfaces, tangible interfaces,gesture interfaces have now become omnipresent in thedesign of live music performance or compositions [24].Most of these interfaces are gesture-based interfaces thatrequire explicit conscious body movements from the user.They can give access to behavioral or self-reported infor-mation, but not to implicit emotional states of the user.

    2.3.2. Implicit Biosignal Interface

    Thanks to the development of more robust and accuratebiosignal technologies, it is now possible to derive emotion-related information from physiological data and use it asan input to interactive music systems. Although the ideais not new [15, 30], the past few years have witnessed agrowing interest from the computer music community inusing physiological data such as heart rate, electrodermalactivity, electroencephalogram and respiration to generateor transform sound and music. Providing emotion-basedphysiological interface is highly relevant for a number ofapplications including music therapy, diagnosis, interac-tive gaming, and emotion-aware musical instruments.

    2.3.3. Emotional Mapping

    Music and its effect on the listener has long been a subjectof fascination and scientific exploration from the Greeksspeculating on the acoustic properties of the voice [14]toMusak researcher designing soothing elevator music. Ithas now become an omnipresent part of our day to daylife. Music is well known for affecting human emotionalstates, and most people enjoy music because of the emo-tions it evokes. Yet, although emotions seem to be a cru-cial aspect of music listening and performance, the scien-tific literature on music and emotion is scarce if comparedto music cognition or perception [21, 10, 17, 5]. The re-lationship between specific musical parameters and time-varying emotional responses is still not clear. Biofeedbackinteractive music systems appear to be an ideal paradigmto explore the complex relationship between emotion andmusic.

    2.4. Perceptual and Cognitive Models of Musical Rep-resentation

    What are the most relevant dimensions of music and howshould they be represented in the system ? Here, we take acognitive psychology approach, and define a set of param-eters that are the most salient perceptually and the mostmeaningful cognitively. Music is a real-world stimulusthat is meant to be appreciated by a human listener. It in-volves a complex set of perceptive and cognitive processes

    Figure 2. The different levels of sequential groupingfor musical material: event fusion, melodic and rhythmicgrouping and formal sectioning (from [35])

    that take place in the central nervous system. The fast ad-vances in the neuroscience of music over the past twentyyears have taught us that these processes are partly inter-dependent, are integrated in time and involve memory aswell as emotional systems [16, 25, 26]. Their study shedlight on the structures and features that are involved inmusic processing and stand out as being perceptually andcognitively relevant. Experimental studies have found thatmusical perception happens at three different time scale,namely the event fusion level when basic musical eventssuch as pitch, intensity and timbre emerge ( 50ms); themelodic and rhythmic grouping when pattern of those ba-sic events are perceived ( 5s), and finally the form level(from 5s to 1 hour) that deals with large scale sections ofmusic [35] (Figure 2). This hierarchy of three time scaleof music processing forms the basis on which we builtSMuSes music processing chain.

    2.5. Music Processing Modules

    Research on the brain substrates underlying music pro-cessing has switched in the last twenty years from a clas-sical view emphasizing a dichotomy between language(supposedly processed in left hemisphere) and music (re-spectively right hemisphere) to a modular view [1]. Thereis some evidence that music processing modules are orga-nized into two parallel but largely independent submod-ules that deal with pitch content (What ?) and tempo-ral content (When ?) respectively [26, 18]. This ev-idence suggests that they can be treated separately in acomputational framework. Additionally, studies involv-ing music-related deficits in neurologically impaired indi-viduals (e.g. subjects with amusias who cant recognizemelodies anymore) have shown that music faculty is com-posed of a set of neurally isolable processing componentsfor pitch, loudness and rhythm [25]. The common view isthat pitch, rhythm and loudness are first processed sepa-

    Voice n

    When

    What

    Instrument n

    Room, Location

    Musical Event

    Acoustic Event

    Pulse

    Rhythm Agent

    Pitch Classes/Chord Agent

    RegisterAgent

    DynamicsAgent

    ArticulationAgent

    Attack Brightness Flux Noisiness Inharmonicity

    Reverb Spatialization

    Voice 1

    Instrument 1

    Global Reference Expressive Modulation

    Memory

    Figure 3. SMuSE is based on a hierarchy of musicalagents

    rately by the brain to then later form (around 25-50ms) animpression of unified musical object [20] (see [16] for areview of the neural basis of music perception). This mod-ularity as well as the three different levels and time scalesof auditory memory (sound, groups, structure) form a setof basic principles for designing our bio-mimetic musicsystem.

    3. A COMPUTATIONAL MODEL BASED ON ASOCIETY OF MUSICAL AGENTS

    The architecture of SMuSe is inspired by neurological ev-idence. It follows a hierarchical and modular structure,and has been implemented as a set of agents using data-flow programming.

    3.1. The SMuSes Architecture

    SMuSe is built on a hierarchical, bio-mimetic and mod-ular architecture. The musical material is represented atthree different hierarchical levels, namely event fusion,event grouping and structure corresponding to differentmemory constraints. From the generative point of view,SMuSe modules are divided into time modules (when)that generate rhythmic pattern of events and content mod-ules (what) that for each time event choose musical ma-terial such as pitch and dynamics (Figure 3).

    These cognitive and perceptual constraints influenced thedesign of the SMuSes architecture. At the low event fu-sion level, SMuSe provides a set of synthesis techniquesvalidated by psychoacoustic tests [2] that give perceptualcontrol over the generation of timbre as well as the use ofMIDI information to define basic musical material such as

  • _368 _369

    pitch, velocity and duration. Inspired by previous workson musical performance modeling [7], the SMuSe also al-lows to modulate the expressiveness of music generationby varying parameters such as phrasing, articulation andperformance noise [2].

    At the medium melodic and rhythmic grouping level, theSMuSe implements various state of the art algorithmiccomposition tools (e.g. generation of tonal, Brownian andserial series of pitches and rhythms, Markov chains, ...).The time scale of this mid-level of processing is in theorder of 5s. for a single grouping, i.e. the time limit ofauditory short-term memory.

    The form level concerns large groupings of events overa long period of time (longer than the short-term mem-ory). It deals with entire sequences of music and relatesto the structure and limits of long-term memory. Influ-enced by experiments in synthetic epistemology and sit-uated robotics, this longer term structure is accomplishedvia the interaction with the environment[37, 38].

    The modularity of the music processing chain is also re-flected in different SMuSe modules that specifically dealwith time (when) or material (what).

    3.2. Agency

    The agent framework is based on the principle that com-plex tasks can be accomplished through a society of sim-ple cross-connected self-contained agents [22]. Here, anagent is understood as anything that can be viewed asperceiving its environment through sensors and acting uponthat environment through effectors. [32]. In the con-text of cognitive science, this paradigm somehow takes astand against a unified theory of mind where a diversityof phenomena would be explained by a single set of rules.The claim here is that surprising, complex and emergentresults can be obtained through the interaction of sim-ple non-linear agents. The agent framework is particu-larly suited to building flexible real-time interactive mu-sical systems based on the principles of modularity, real-time interaction and situatedness.

    3.3. Data-flow Programming

    We chose to implement this hierarchy of musical agentsin SMuSe in a data-flow programming language calledMax/MSP [40]. Data flow programming conceptually mod-els a program as a directed graph of data flowing betweenoperations. This kind of model can easily represent par-allel processing which is common in biological systems,and is also convenient to represent an agent-based modu-lar architecture.

    Interestingly enough, programming in Max/MSP encour-ages the programmer to think in a way that is close to howa brain might works. Firstly, since Max/MSP is basedon a data-flow paradigm, processes can operate in parallel

    (e.g. pitch and rhythm processes). Secondly, thanks to theconcept of patch abstraction (a Max meta-patch that ab-stracts or include another Max patch), one is able to easilybuild several layers of processing units (which is some-how similar to the different layers of the cortex). Finally,each process can be connected to every other in a vari-ety of ways (like neurons). Of course, the comparison toneural processes is limited to higher level, organizationalprocesses.

    3.4. Distributed Control

    All the musical agents in SMuSe are OSC-compatible [39]which means they can be controlled and accessed fromanywhere (including over a network) at any time. Thisgives great flexibility to the system, and allows for sharedcollaborative compositions where several clients accessand modulate the music server. In this collaborative com-position paradigm, every performer builds on what theothers have done. The result is a complex sound struc-ture that keeps evolving as long as different performerscontribute changes to its current shape. A parallel couldbe drawn with stigmergic mechanisms of coordination be-tween social insects like ants [34, 3, 11] . In ants colonies,the pheromonal trace left by one ant at a given time isused as a means to communicate and stimulate the actionof the others. Hence they manage to collectively buildcomplex networks of trails towards food sources. Simi-larly, in a collective music paradigm powered by an OSCclient/server architecture, one performer leaves a musicaltrace to the shared composition, which in turn stimulatethe other co-performers to react and build on top of it.

    3.5. Concurrent andOn-the-fly Control ofMusical Pro-cesses

    We have proposed a biologically inspired memory andprocess architecture for SMuSe as well as a computationalmodel based on software agents. The OSC communica-tion protocol allows to easily send text-based commandsto specific agents in the hierarchy. It allows for flexibleand intuitive time-based, concurrent and on-the-fly con-trol of musical processes.

    The different musical agents in the SMuSe all have a spe-cific ID/address where to receive commands and data. Theaddresses are divided into /global (affecting the whole hi-erarchy), /voice[n] (affecting specific voices), and /synth[n](affecting specific sound generators). The OSC syntaxsupports regular expressions which allows to address sev-eral modules at the same time with a compact syntax.

    Patterns of perceptually-grounded musical features are sentto the short-term (STM) and long-term memory (LTM)modules at any moment in time via specific commands.

    /* Example: fill up the STM *//voice1/rhythm/pattern 4n 4n 8n

    16n 16n 4n/voice1/pitch/pattern 0 0 5 7 10 0/voice1/pitch/register 4 5 4 4 4 5/voice1/velocity 12 12 16 16 12 32

    Musical sequence generation follow different Selection prin-ciples, a term inspired by the reflexions of Koenig on se-rial music and algorithmic composition [19]. It refers tothe actions taken by the system to generate musical eventsusing the available short and long term memory content.These actions can be deterministic (e.g. playback of astored sequence) or based on probability of occurrenceof specific events (series, Markov chains, random events).This allows for an hybrid approach to algorithmic compo-sition where complex stochastic processes are mixed withmore deterministic repeating patterns (Cf. Table 1). Ex-pressivity parameters such as articulation, tempo and dy-namics can be continuously accessed and modulated.

    SelectionPrinci-ple

    Description

    Sequence The order of selection follows the initialsequence. (The sequence in memory is

    played back).Inverse The elements of the original are selected

    starting from the end. (The sequence inmemory is played backward).

    Markov The elements of a sequence are chosenbased on state transition probabilities

    Series Uniform random choice between elementsof the pattern without repetition. If anelement of the pattern has already been

    selected, it cant be selected again until allthe other elements have been selected.

    Aleatory Elements of the sequence are chosenrandomly.

    Table 1. The action selection agents choose to play spe-cific musical elements stored in the current working STMfollowing deterministic or stochastic selection principles.

    3.6. Human in the loop

    We tested the SMuSe within different sensing environ-ments ranging from physiology sensors that can provideimplicit emotional user interaction (heart rate, electroder-mal activity, electroencephalogram), to virtual and mixed-reality sensors for behavioral interaction (camera, gazers,lasers, pressure sensitive floors) and finally MIDI and au-dio (microphone) for direct musical interaction (Figure 4).SMuSe integrates sensory data from the environment (that

    The SMuSe

    Microphone

    Biosignals

    Wiimote

    XIM

    IQR IanniX

    Terminal Torque

    Audio Feedback

    Network Clients Sensor Data

    OSC

    Environment

    Figure 4. SMuSes environment: the SMuSe can inter-act with its environment through different sensors such asbiosignals, camera, gazers, lasers, pressure sensitive floor,MIDI, audio, but also via OSC commands sent from clientapplications (such as console terminal, IQR, Iannix graph-ical score, Torque game engine, etc.) to the music serverover the network.

    conveys information about the human participants inter-acting with the system) in real time and send this inter-preted data to the music generation processes after ap-propriate fixed or learned musical mappings. The initialmusical material generated by SMuSe is amplified, trans-formed and nuanced as the interaction between the systemand the participant evolves.

    3.7. Emotional Mappings

    We have built a situated cognitive music system that issensitive to its environment via musical, behavioral andphysiological sensors. Thanks to a flexible architecture,the system is able to memorize, combine and generatecomplex musical structures in real-time. We have de-scribed various sensate environments that provide feed-back information from the human interactor and allow toclose the interaction loop. As proposed previously, wetake an approach that focuses on emotion-related feed-back.

    3.7.1. Predefined Mappings

    From advanced behavioral and physiological interfaces,we can infer emotion-related information about the inter-action with SMuSe. One possible way to take this feed-back into account is to design fixed mappings based onprevious results from experimental psychology studies thathave investigated the emotional responses to specific mu-sical parameters. This a priori knowledge can be used todrive the choice of musical parameters depending on thedifference between the goal emotion to be expressed or in-duced, and the emotional state detected by the system viaits sensors. A number of reviews have proposed genericrelationships between sound parameters and emotional re-sponses [9, 13, 12]. These results serve as the basis forexplicit emotional mappings (Cf. Table 2) and have beenconfirmed in the specific context of SMuSes parameterspace. This explicit design approach is detailed in [2].

  • _368 _369

    pitch, velocity and duration. Inspired by previous workson musical performance modeling [7], the SMuSe also al-lows to modulate the expressiveness of music generationby varying parameters such as phrasing, articulation andperformance noise [2].

    At the medium melodic and rhythmic grouping level, theSMuSe implements various state of the art algorithmiccomposition tools (e.g. generation of tonal, Brownian andserial series of pitches and rhythms, Markov chains, ...).The time scale of this mid-level of processing is in theorder of 5s. for a single grouping, i.e. the time limit ofauditory short-term memory.

    The form level concerns large groupings of events overa long period of time (longer than the short-term mem-ory). It deals with entire sequences of music and relatesto the structure and limits of long-term memory. Influ-enced by experiments in synthetic epistemology and sit-uated robotics, this longer term structure is accomplishedvia the interaction with the environment[37, 38].

    The modularity of the music processing chain is also re-flected in different SMuSe modules that specifically dealwith time (when) or material (what).

    3.2. Agency

    The agent framework is based on the principle that com-plex tasks can be accomplished through a society of sim-ple cross-connected self-contained agents [22]. Here, anagent is understood as anything that can be viewed asperceiving its environment through sensors and acting uponthat environment through effectors. [32]. In the con-text of cognitive science, this paradigm somehow takes astand against a unified theory of mind where a diversityof phenomena would be explained by a single set of rules.The claim here is that surprising, complex and emergentresults can be obtained through the interaction of sim-ple non-linear agents. The agent framework is particu-larly suited to building flexible real-time interactive mu-sical systems based on the principles of modularity, real-time interaction and situatedness.

    3.3. Data-flow Programming

    We chose to implement this hierarchy of musical agentsin SMuSe in a data-flow programming language calledMax/MSP [40]. Data flow programming conceptually mod-els a program as a directed graph of data flowing betweenoperations. This kind of model can easily represent par-allel processing which is common in biological systems,and is also convenient to represent an agent-based modu-lar architecture.

    Interestingly enough, programming in Max/MSP encour-ages the programmer to think in a way that is close to howa brain might works. Firstly, since Max/MSP is basedon a data-flow paradigm, processes can operate in parallel

    (e.g. pitch and rhythm processes). Secondly, thanks to theconcept of patch abstraction (a Max meta-patch that ab-stracts or include another Max patch), one is able to easilybuild several layers of processing units (which is some-how similar to the different layers of the cortex). Finally,each process can be connected to every other in a vari-ety of ways (like neurons). Of course, the comparison toneural processes is limited to higher level, organizationalprocesses.

    3.4. Distributed Control

    All the musical agents in SMuSe are OSC-compatible [39]which means they can be controlled and accessed fromanywhere (including over a network) at any time. Thisgives great flexibility to the system, and allows for sharedcollaborative compositions where several clients accessand modulate the music server. In this collaborative com-position paradigm, every performer builds on what theothers have done. The result is a complex sound struc-ture that keeps evolving as long as different performerscontribute changes to its current shape. A parallel couldbe drawn with stigmergic mechanisms of coordination be-tween social insects like ants [34, 3, 11] . In ants colonies,the pheromonal trace left by one ant at a given time isused as a means to communicate and stimulate the actionof the others. Hence they manage to collectively buildcomplex networks of trails towards food sources. Simi-larly, in a collective music paradigm powered by an OSCclient/server architecture, one performer leaves a musicaltrace to the shared composition, which in turn stimulatethe other co-performers to react and build on top of it.

    3.5. Concurrent andOn-the-fly Control ofMusical Pro-cesses

    We have proposed a biologically inspired memory andprocess architecture for SMuSe as well as a computationalmodel based on software agents. The OSC communica-tion protocol allows to easily send text-based commandsto specific agents in the hierarchy. It allows for flexibleand intuitive time-based, concurrent and on-the-fly con-trol of musical processes.

    The different musical agents in the SMuSe all have a spe-cific ID/address where to receive commands and data. Theaddresses are divided into /global (affecting the whole hi-erarchy), /voice[n] (affecting specific voices), and /synth[n](affecting specific sound generators). The OSC syntaxsupports regular expressions which allows to address sev-eral modules at the same time with a compact syntax.

    Patterns of perceptually-grounded musical features are sentto the short-term (STM) and long-term memory (LTM)modules at any moment in time via specific commands.

    /* Example: fill up the STM *//voice1/rhythm/pattern 4n 4n 8n

    16n 16n 4n/voice1/pitch/pattern 0 0 5 7 10 0/voice1/pitch/register 4 5 4 4 4 5/voice1/velocity 12 12 16 16 12 32

    Musical sequence generation follow different Selection prin-ciples, a term inspired by the reflexions of Koenig on se-rial music and algorithmic composition [19]. It refers tothe actions taken by the system to generate musical eventsusing the available short and long term memory content.These actions can be deterministic (e.g. playback of astored sequence) or based on probability of occurrenceof specific events (series, Markov chains, random events).This allows for an hybrid approach to algorithmic compo-sition where complex stochastic processes are mixed withmore deterministic repeating patterns (Cf. Table 1). Ex-pressivity parameters such as articulation, tempo and dy-namics can be continuously accessed and modulated.

    SelectionPrinci-ple

    Description

    Sequence The order of selection follows the initialsequence. (The sequence in memory is

    played back).Inverse The elements of the original are selected

    starting from the end. (The sequence inmemory is played backward).

    Markov The elements of a sequence are chosenbased on state transition probabilities

    Series Uniform random choice between elementsof the pattern without repetition. If anelement of the pattern has already been

    selected, it cant be selected again until allthe other elements have been selected.

    Aleatory Elements of the sequence are chosenrandomly.

    Table 1. The action selection agents choose to play spe-cific musical elements stored in the current working STMfollowing deterministic or stochastic selection principles.

    3.6. Human in the loop

    We tested the SMuSe within different sensing environ-ments ranging from physiology sensors that can provideimplicit emotional user interaction (heart rate, electroder-mal activity, electroencephalogram), to virtual and mixed-reality sensors for behavioral interaction (camera, gazers,lasers, pressure sensitive floors) and finally MIDI and au-dio (microphone) for direct musical interaction (Figure 4).SMuSe integrates sensory data from the environment (that

    The SMuSe

    Microphone

    Biosignals

    Wiimote

    XIM

    IQR IanniX

    Terminal Torque

    Audio Feedback

    Network Clients Sensor Data

    OSC

    Environment

    Figure 4. SMuSes environment: the SMuSe can inter-act with its environment through different sensors such asbiosignals, camera, gazers, lasers, pressure sensitive floor,MIDI, audio, but also via OSC commands sent from clientapplications (such as console terminal, IQR, Iannix graph-ical score, Torque game engine, etc.) to the music serverover the network.

    conveys information about the human participants inter-acting with the system) in real time and send this inter-preted data to the music generation processes after ap-propriate fixed or learned musical mappings. The initialmusical material generated by SMuSe is amplified, trans-formed and nuanced as the interaction between the systemand the participant evolves.

    3.7. Emotional Mappings

    We have built a situated cognitive music system that issensitive to its environment via musical, behavioral andphysiological sensors. Thanks to a flexible architecture,the system is able to memorize, combine and generatecomplex musical structures in real-time. We have de-scribed various sensate environments that provide feed-back information from the human interactor and allow toclose the interaction loop. As proposed previously, wetake an approach that focuses on emotion-related feed-back.

    3.7.1. Predefined Mappings

    From advanced behavioral and physiological interfaces,we can infer emotion-related information about the inter-action with SMuSe. One possible way to take this feed-back into account is to design fixed mappings based onprevious results from experimental psychology studies thathave investigated the emotional responses to specific mu-sical parameters. This a priori knowledge can be used todrive the choice of musical parameters depending on thedifference between the goal emotion to be expressed or in-duced, and the emotional state detected by the system viaits sensors. A number of reviews have proposed genericrelationships between sound parameters and emotional re-sponses [9, 13, 12]. These results serve as the basis forexplicit emotional mappings (Cf. Table 2) and have beenconfirmed in the specific context of SMuSes parameterspace. This explicit design approach is detailed in [2].

  • _370 _371

    Table 2. The relationship between musical parametersand perceived emotion. A summary of the main results.(adapted from [10])

    3.7.2. Adaptive Mappings

    In cases where the relationship between the change ofmusical parameters and the emotional response has to belearned, explicit mappings is not possible. One solutionin SMuSe is to use a Reinforcement Learning (RL) agentthat learns to adapt the choice of musical parameters basedon the interaction of the systemwith the environment [36].Reinforcement learning is a biologically plausible learn-ing algorithm particularly suited to an explorative and adap-tive approach to mapping as it tries to find a sequenceof parameter changes that optimizes a reward function(which for instance relates to a state of emotional stress)[2]. Unlike most previous examples of the use reinforce-ment learning in music systems, where the reward relatesto some predefined musical rules or quality of improvisa-tion, we are interested in the emotional feedback from thelistener (Figure 5).

    This approach contrasts with expert systems such as theKTH rule system [7, 8] that can modulate the expressiv-ity of music by applying a set of predefined rules inferredfrom previous extensive music and performance analysis.Here, we propose a paradigm where the system learns toautonomously tune its own parameters in function of thedesired reward function (some emotional feedback) with-out using any a-priori musical rule.

    Interestingly enough, the biological validity of RL is sup-ported by numerous studies in psychology and neurosciencethat found various examples of reinforcement learning inanimal behavior (e.g. foraging behavior of bees [23] or

    Participant

    SMuSe

    Reinforcement Agent

    Audio Feedback

    Reward

    Action

    Figure 5. The reinforcement-based music system iscomposed of three main components: the music engine(the SMuSe), the reinforcement learning agent and the lis-tener who provides the reward signal.

    the dopamine system in primate brains [33]).

    4. ARTISTIC REALIZATIONS

    We have further explored the purposive construction of in-teractive installations and performances using the SMuSesystem. To name but a few, during the VRoboser instal-lation [2], the sensory inputs (motion, color, distance) ofa 3D virtual Khepera1 robot living in a game-like envi-ronment modulated musical parameters in real-time, thuscreating a never ending musical soundscape in the spiritof Brain Enos Music for Airports. In another contextthe SMuSe generated automatic soundscapes and musicwhich reacted to and influenced the spatial behavior ofhuman and avatars in the mixed-reality space called XIM(for eXperience Induction Machine) [2] thus emphasizingthe role of the environment and interaction on the musicalcomposition. Based on similar premises, Re(PER)curso,an interactive mixed reality performance involving dance,percussion, interactive music and video was presented atthe ArtFutura Festival 07 and Museum of Modern Art inBarcelona in the same year. The performance was com-posed by several interlaced layers of artistic and techno-logical activities. The music controlled had three com-ponents: a predefined soundscape, the percussionist whoperformed from a score and the interactive compositionsystem synchronized by SMuSe; the physical actors, thepercussionist and the dancer were tracked by a video basedactive tracking system that in turn controlled an array ofmoving lights that illuminated the scene. The spatial in-formation from the stage obtained by the tracking systemwas also projected onto the virtual world where it modu-lated the avatars behavior allowing it to adjust body posi-tion, posture and gaze to the physical world. In 2009, theBrain Orchestra, a multimodal performance using braincomputer interfaces, explored the creative potential of acollection of brains directly interfaced to the world. Dur-ing the performance, four brain musicians were control-ling a string quartet generated by the SMuSe using theirbrain activity alone. The orchestra was conducted by anemotional conductor, whose emotional reactions wererecorded using biosignal interfaces and fed back to the

    1http://www.k-team.com/

    A) B)

    C)

    Figure 6. Artistic realizations: A) Re(PER)curso (ArtFutura and MACBA, Barcelona 2007) B) TheMultimodalBrain Orchestra (FET, Prague 2009) C) XIM sonification(2007)

    system. The Brain Orchestra was premiered in Prague forthe FET 09 meeting organized by the European Commis-sion. [2]. Finally, a live performance of a piece inspiredby Terry Rileys in C served as an illustration of theprinciples of parallelism, situatedness and emergence ex-hibited by the SMuSe at the Ernst Strungmann Forum onLanguage, Music and the Brain: a mysterious relation-ship.

    5. CONCLUSIONS

    The SMuSe illustrates a novel situated approach to musiccomposition systems. It is built on a cognitively plausi-ble architecture that takes into account the different timeframes of music processing, and uses an agent frameworkto model a society of simple distributed musical processes.It takes advantage of its interaction with the environmentto go beyond the classic sense-think-act paradigm [31].It combines cognitively relevant representations with per-ceptually grounded sound synthesis techniques and is basedon modern data-flow audio programming practices [28,29]. This provides an intuitive, flexible and distributedcontrol environment that can easily generate complex mu-sical structure in real-time. SMuSe can sense its environ-ment via a variety of sensors, notably physiology-basedsensors. The analysis and extraction of relevant informa-tion from sensor data allows to re-inject emotion-basedfeedback to the system based on the responses of the hu-man participant. The SMuSe proposes a set of pre-wiredemotional mappings from emotions to musical parametersgrounded on the literature on music and emotion, as wellas a reinforcement learning agent that performs onlineadaptive mapping. It provides a well grounded approachtowards the development of advanced synthetic aestheticsystems and a further understanding of the fundamentalpsychological processes on which it relies.

    6. REFERENCES

    [1] E. Altenmuller, How many music centers are in thebrain? in Annals of the New York Academy of Sci-ences, vol. 930, no. The Biological Foundations ofMusic. John Wiley & Sons, 2001, pp. 273280.

    [2] Anonymous, For blind review purpose.

    [3] E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarmintelligence: from natural to artificial systems. Ox-ford University Press, USA, 1999.

    [4] B. Bongers, Physical interfaces in the electronicarts. Interaction theory and interfacing techniquesfor real-time performance, Trends in Gestural Con-trol of Music, pp. 4170, 2000.

    [5] M. M. Bradley and P. J. Lang, Affective reactions toacoustic stimuli. Psychophysiology, vol. 37, no. 2,pp. 204215, March 2000.

    [6] J. Fodor, The language of thought. Harvard UnivPr, 1975.

    [7] A. Friberg, R. Bresin, and J. Sundberg, Overviewof the kth rule system for musical performance, Ad-vances in Cognitive Psychology, Special Issue onMusic Performance, vol. 2, no. 2-3, pp. 145161,2006.

    [8] A. Friberg, pdm: An expressive sequencer withreal-time control of the kth music-performancerules, Comput. Music J., vol. 30, no. 1, pp. 3748,2006.

    [9] A. Gabrielson and P. N. Juslin, Emotional expres-sion in music performance: Between the performersintention and the listeners experience, in Psychol-ogy of Music, vol. 24, no. 1, 1996, pp. 6891.

    [10] A. Gabrielsson and E. Lindstrm, Music and Emo-tion - Theory and Research, ser. Series in AffectiveScience. NewYork: Oxford University Press, 2001,ch. The Influence of Musical Structure on EmotionalExpression.

    [11] E. Hutchins and G. Lintern, Cognition in the Wild.MIT Press, Cambridge, Mass., 1995.

    [12] P. N. Juslin and P. Laukka, Communication of emo-tions in vocal expression and music performance:different channels, same code? Psychol Bull, vol.129, no. 5, pp. 770814, Sep 2003.

    [13] P. N. Juslin and J. A. Sloboda, Eds.,Music and emo-tion : theory and research. Oxford ; New York:Oxford University Press, 2001.

    [14] P. Kivy, Introduction to a Philosophy of Music. Ox-ford University Press, USA, 2002.

  • _370 _371

    Table 2. The relationship between musical parametersand perceived emotion. A summary of the main results.(adapted from [10])

    3.7.2. Adaptive Mappings

    In cases where the relationship between the change ofmusical parameters and the emotional response has to belearned, explicit mappings is not possible. One solutionin SMuSe is to use a Reinforcement Learning (RL) agentthat learns to adapt the choice of musical parameters basedon the interaction of the systemwith the environment [36].Reinforcement learning is a biologically plausible learn-ing algorithm particularly suited to an explorative and adap-tive approach to mapping as it tries to find a sequenceof parameter changes that optimizes a reward function(which for instance relates to a state of emotional stress)[2]. Unlike most previous examples of the use reinforce-ment learning in music systems, where the reward relatesto some predefined musical rules or quality of improvisa-tion, we are interested in the emotional feedback from thelistener (Figure 5).

    This approach contrasts with expert systems such as theKTH rule system [7, 8] that can modulate the expressiv-ity of music by applying a set of predefined rules inferredfrom previous extensive music and performance analysis.Here, we propose a paradigm where the system learns toautonomously tune its own parameters in function of thedesired reward function (some emotional feedback) with-out using any a-priori musical rule.

    Interestingly enough, the biological validity of RL is sup-ported by numerous studies in psychology and neurosciencethat found various examples of reinforcement learning inanimal behavior (e.g. foraging behavior of bees [23] or

    Participant

    SMuSe

    Reinforcement Agent

    Audio Feedback

    Reward

    Action

    Figure 5. The reinforcement-based music system iscomposed of three main components: the music engine(the SMuSe), the reinforcement learning agent and the lis-tener who provides the reward signal.

    the dopamine system in primate brains [33]).

    4. ARTISTIC REALIZATIONS

    We have further explored the purposive construction of in-teractive installations and performances using the SMuSesystem. To name but a few, during the VRoboser instal-lation [2], the sensory inputs (motion, color, distance) ofa 3D virtual Khepera1 robot living in a game-like envi-ronment modulated musical parameters in real-time, thuscreating a never ending musical soundscape in the spiritof Brain Enos Music for Airports. In another contextthe SMuSe generated automatic soundscapes and musicwhich reacted to and influenced the spatial behavior ofhuman and avatars in the mixed-reality space called XIM(for eXperience Induction Machine) [2] thus emphasizingthe role of the environment and interaction on the musicalcomposition. Based on similar premises, Re(PER)curso,an interactive mixed reality performance involving dance,percussion, interactive music and video was presented atthe ArtFutura Festival 07 and Museum of Modern Art inBarcelona in the same year. The performance was com-posed by several interlaced layers of artistic and techno-logical activities. The music controlled had three com-ponents: a predefined soundscape, the percussionist whoperformed from a score and the interactive compositionsystem synchronized by SMuSe; the physical actors, thepercussionist and the dancer were tracked by a video basedactive tracking system that in turn controlled an array ofmoving lights that illuminated the scene. The spatial in-formation from the stage obtained by the tracking systemwas also projected onto the virtual world where it modu-lated the avatars behavior allowing it to adjust body posi-tion, posture and gaze to the physical world. In 2009, theBrain Orchestra, a multimodal performance using braincomputer interfaces, explored the creative potential of acollection of brains directly interfaced to the world. Dur-ing the performance, four brain musicians were control-ling a string quartet generated by the SMuSe using theirbrain activity alone. The orchestra was conducted by anemotional conductor, whose emotional reactions wererecorded using biosignal interfaces and fed back to the

    1http://www.k-team.com/

    A) B)

    C)

    Figure 6. Artistic realizations: A) Re(PER)curso (ArtFutura and MACBA, Barcelona 2007) B) TheMultimodalBrain Orchestra (FET, Prague 2009) C) XIM sonification(2007)

    system. The Brain Orchestra was premiered in Prague forthe FET 09 meeting organized by the European Commis-sion. [2]. Finally, a live performance of a piece inspiredby Terry Rileys in C served as an illustration of theprinciples of parallelism, situatedness and emergence ex-hibited by the SMuSe at the Ernst Strungmann Forum onLanguage, Music and the Brain: a mysterious relation-ship.

    5. CONCLUSIONS

    The SMuSe illustrates a novel situated approach to musiccomposition systems. It is built on a cognitively plausi-ble architecture that takes into account the different timeframes of music processing, and uses an agent frameworkto model a society of simple distributed musical processes.It takes advantage of its interaction with the environmentto go beyond the classic sense-think-act paradigm [31].It combines cognitively relevant representations with per-ceptually grounded sound synthesis techniques and is basedon modern data-flow audio programming practices [28,29]. This provides an intuitive, flexible and distributedcontrol environment that can easily generate complex mu-sical structure in real-time. SMuSe can sense its environ-ment via a variety of sensors, notably physiology-basedsensors. The analysis and extraction of relevant informa-tion from sensor data allows to re-inject emotion-basedfeedback to the system based on the responses of the hu-man participant. The SMuSe proposes a set of pre-wiredemotional mappings from emotions to musical parametersgrounded on the literature on music and emotion, as wellas a reinforcement learning agent that performs onlineadaptive mapping. It provides a well grounded approachtowards the development of advanced synthetic aestheticsystems and a further understanding of the fundamentalpsychological processes on which it relies.

    6. REFERENCES

    [1] E. Altenmuller, How many music centers are in thebrain? in Annals of the New York Academy of Sci-ences, vol. 930, no. The Biological Foundations ofMusic. John Wiley & Sons, 2001, pp. 273280.

    [2] Anonymous, For blind review purpose.

    [3] E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarmintelligence: from natural to artificial systems. Ox-ford University Press, USA, 1999.

    [4] B. Bongers, Physical interfaces in the electronicarts. Interaction theory and interfacing techniquesfor real-time performance, Trends in Gestural Con-trol of Music, pp. 4170, 2000.

    [5] M. M. Bradley and P. J. Lang, Affective reactions toacoustic stimuli. Psychophysiology, vol. 37, no. 2,pp. 204215, March 2000.

    [6] J. Fodor, The language of thought. Harvard UnivPr, 1975.

    [7] A. Friberg, R. Bresin, and J. Sundberg, Overviewof the kth rule system for musical performance, Ad-vances in Cognitive Psychology, Special Issue onMusic Performance, vol. 2, no. 2-3, pp. 145161,2006.

    [8] A. Friberg, pdm: An expressive sequencer withreal-time control of the kth music-performancerules, Comput. Music J., vol. 30, no. 1, pp. 3748,2006.

    [9] A. Gabrielson and P. N. Juslin, Emotional expres-sion in music performance: Between the performersintention and the listeners experience, in Psychol-ogy of Music, vol. 24, no. 1, 1996, pp. 6891.

    [10] A. Gabrielsson and E. Lindstrm, Music and Emo-tion - Theory and Research, ser. Series in AffectiveScience. NewYork: Oxford University Press, 2001,ch. The Influence of Musical Structure on EmotionalExpression.

    [11] E. Hutchins and G. Lintern, Cognition in the Wild.MIT Press, Cambridge, Mass., 1995.

    [12] P. N. Juslin and P. Laukka, Communication of emo-tions in vocal expression and music performance:different channels, same code? Psychol Bull, vol.129, no. 5, pp. 770814, Sep 2003.

    [13] P. N. Juslin and J. A. Sloboda, Eds.,Music and emo-tion : theory and research. Oxford ; New York:Oxford University Press, 2001.

    [14] P. Kivy, Introduction to a Philosophy of Music. Ox-ford University Press, USA, 2002.

  • _372 _373

    [15] B. R. Knapp and H. S. Lusted, A bioelectric con-troller for computer music applications, ComputerMusic Journal, vol. 14, no. 1, pp. 4247, 1990.

    [16] S. Koelsch and W. Siebel, Towards a neural basisof music perception, Trends in Cognitive Sciences,vol. 9, no. 12, pp. 578584, 2005.

    [17] C. Krumhansl, An exploratory study of musicalemotions and psychophysiology, Canadian journalof experimental psychology, vol. 51, no. 4, pp. 336353, 1997.

    [18] , Rhythm and pitch in music cognition, Psy-chological Bulletin, vol. 126, no. 1, pp. 159179,2000.

    [19] O. Laske, Composition theory in Koenigs projectone and project two, Computer Music Journal, pp.5465, 1981.

    [20] D. Levitin and A. Tirovolas, Current advances inthe cognitive neuroscience of music, Annals of theNew York Academy of Sciences, vol. 1156, no. TheYear in Cognitive Neuroscience 2009, pp. 211231,2009.

    [21] L. B. Meyer, Emotion and Meaning in Music. TheUniversity of Chicago Press, 1956.

    [22] M. Minsky, The society of mind. Simon and Schus-ter, 1988.

    [23] P. Montague, P. Dayan, C. Person, and T. Sejnowski,Bee foraging in uncertain environments using pre-dictive hebbian learning, Nature, vol. 377, no.6551, pp. 725728, 1995.

    [24] J. Paradiso, Electronic music: new ways to play,Spectrum, IEEE, vol. 34, no. 12, pp. 1830, 2002.

    [25] I. Peretz and M. Coltheart, Modularity of musicprocessing, Nature Neuroscience, vol. 6, no. 7, pp.688691, 2003.

    [26] I. Peretz and R. Zatorre, Brain organization for mu-sic processing, Psychology, vol. 56, 2005.

    [27] R. Pfeifer and C. Scheier, Understanding intelli-gence. The MIT Press, 2001.

    [28] M. Puckette, Pure data: another integrated com-puter music environment, in Proceedings of the 2ndIntercollege Computer Music Concerts, Tachikawa,Japan, 1996.

    [29] M. Puckette, M. Ucsd, T. Apel et al., Real-time au-dio analysis tools for pd and msp, 1998.

    [30] D. Rosenboom, Biofeedback and the arts: Resultsof early experiments, in Computer Music Journal,vol. 13, no. 4, 1989, pp. 8688.

    [31] R. Rowe, Interactive music systems: machine listen-ing and composing. Cambridge, MA, USA: MITPress, 1993.

    [32] S. Russell and P. Norvig, Artificial intelligence: amodern approach. Prentice hall, 2009.

    [33] W. Schultz, P. Dayan, and P. Montague, A neu-ral substrate of prediction and reward, Science, vol.275, no. 5306, p. 1593, 1997.

    [34] H. Simon, The science of the artificial, Cam-bridge, USA, MA, 1981.

    [35] B. Snyder, Music and memory: an introduction.The MIT Press, 2000.

    [36] R. S. Sutton and A. G. Barto, Reinforcement Learn-ing: An Introduction (Adaptive Computation andMachine Learning). The MIT Press, March 1998.

    [37] P. F. M. J. Verschure, T. Voegtlin, and R. J. Dou-glas, Environmentally mediated synergy betweenperception and behaviour in mobile robots, Nature,vol. 425, no. 6958, pp. 6204, Oct 2003.

    [38] P. F. Verschure, Synthetic epistemology: The ac-quisition, retention, and expression of knowledgein natural and synthetic systems, in Fuzzy SystemsProceedings, 1998. IEEE World Congress on Com-putational Intelligence., The 1998 IEEE Interna-tional Conference on, vol. 1. IEEE, 1998, pp. 147152.

    [39] M. Wright, Open sound control: an enablingtechnology for musical networking, Org. Sound,vol. 10, no. 3, pp. 193200, 2005.

    [40] D. Zicarelli, How I learned to love a program thatdoes nothing, Computer Music Journal, no. 26, pp.4451, 2002.

    BACH: AN ENVIRONMENT FOR COMPUTER-AIDED COMPOSITIONIN MAX

    Andrea Agostini

    Freelance composer

    Daniele Ghisi

    Composer - Casa de Velazquez

    ABSTRACT

    Environments for computer-aided composition (CAC forshort), allowing generation and transformation of sym-bolic musical data, are usually counterposed to real-timeenvironments or sequencers. The counterposition is deeplymethodological: in traditional CAC environments inter-face changes have no effect until a certain refresh opera-tion is performed, whereas real-time environments imme-diately react to user input. We shall present in this arti-cle a library for Max, named bach: automatic composershelper, which adds highly refined capabilities for musicalnotation and symbolic processing to a typically real-timeenvironment, in order to recompose the fracture betweencomputer-aided composition and the real-time world.

    1. INTRODUCTION

    Since the advent of computers there has been great in-terest on how to take advantage of their superior preci-sion, speed and power in music-related activities. Theprobably best-known (and commercially successful) di-rection has proven being the generation and transforma-tion of sound. In recent years, inexpensive personal com-puters (and lately even top-end mobile phones) have gainedthe ability to perform professional-quality audio transfor-mation and generation in real-time. On the other hand,several systems have been developed to process symbolicdata rather than acoustic ones - notes rather than sounds.

    These systems can be roughly divided into tools forcomputer-assisted music engraving (such as Finale, Sibelius,Lilypond...) and tools for computer-aided composition(CAC for short, allowing generation and transformation ofsymbolic musical data, such as OpenMusic1[1], PWGL2,CommonMusic3...). Moreover, at least two graphical pro-gramming environments, the closely relatedMax and Pure-Data, have MIDI control and sound generation and trans-formation among their main focuses - but at the sametime they are capable to deal with arbitrary set of data,input/output devices and video. Indeed, the boundariesbetween all these categories are fuzzy: music engravingsystems often allow non-trivial data processing; some se-quencers also provide high-quality graphical representa-tion of musical scores and sound treatment; modern CAC

    1http://repmus.ircam.fr/openmusic/home2http://www2.siba.fi/PWGL/3http://commonmusic.sourceforge.net/

    environments include tools for sound synthesis and trans-formation. It should though be remarked that Max andPureData have very crude native support for sequencing,and essentially none for symbolic musical notation.

    Another, orthogonal distinction should be made be-tween real-time systems, which immediately react to in-terface actions (such as Finale, MaxMSP, ProTools...) andnon-real-time systems, where these actions have no effectuntil a certain refresh operation is performed (such asLilypond, OpenMusic, PWGL). The latter is the case oftypical CAC environments; yet, in some cases this is un-natural, and it might be argued that there is no deep reasonwhy symbolic processing should not be performed in real-time. This does not mean that every compositional pro-cess should benefit from a real-time data flow, but somemight, as we shall exemplify at the end of the paper. Real-time is a resource, rather than an obligation. Yet, the lackof this resource has pushed, up to now, the developmentof CAC techniques only in the off-line direction.

    In our own experience, the real-time or non-real timenature of an environment for music composition deeplyaffects the very nature of the compositional process. Com-posers working with sequencers, plug-ins and electronicinstruments need them to immediately react as they changetheir parameters; likewise, composers working with sym-bolic data might want the machine to quickly adapt tonew parameter configurations. As composers ourselves,we believe that the creation and modification of a musi-cal score is not an out-of-time activity, but it follows thecomposers discovery process and develops accordingly.

    This issue has been faced by Miller Puckette in [11]:

    While we have good paradigms for describ-ing processes (such as in the Max or Pd pro-grams as they stand today), and while muchwork has been done on representations of mu-sical data (ranging from searchable databasesof sound to Patchwork and OpenMusic, andincluding Pds unfinished data editor), we lacka fluid mechanism for the two worlds to inter-operate.

    Arshia Cont in [5] adds:

    The performers of computer music have beenfaster to grab ideas in real time manipulationsand adopting them to their needs. Today, withmany exceptions, a wide majority of com-posed mixed instrumental and electronic pieces