15
IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS , STOCKHOLM SWEDEN 2020 Reciprocal sound transformations for computer supported collaborative jamming ROOSA KALLIONPÄÄ KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2020

Reciprocal sound transformations for computer supported collaborative jamming

ROOSA KALLIONPÄÄ

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Page 2: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

SAMMANFATTNING Jammande i grupp med digitala musikinstrument (DMI) avslöjar ett behov för att kunna synkronisera dem utgående signalerna. Temporära lösningar har etablerats, men en bättre förståelse för hur live ljudtransformationer skulle kunna balanseras över flera instrument är nödvändig. I detta arbete utvecklades och designades en teknologisk sond för reciproka ljudtransformationer genom att koppla ihop fyra musikers instrument och en flerlagersavbildning skapades med ett delat gränssnitt, högnivå ljudattribut samt ljudsyntesparametrarna för varje instrument. Sonden designades och användes under co-design-workshops, där sju högnivå ljudattribut konstruerades enligt spectromorfologiramverket. Analysen, där begreppen soniskt berättande och konceptet flyt applicerades, avslöjar hur realtidskontroll av reciproka ljudtransformationer främjer medverkande genom att stödja rolltagande, motivera ensemblen, samt rikta fokuset hos medlemmarna. Även om det inte går att hävda att de implementerade attributen är generella, så identifierades utmaningarna hos den valda avbildningsstrategien och hos användargränssnittet.

Page 3: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

Reciprocal sound transformations for computer supported collaborative jamming

Roosa Kallionpää KTH Royal Institute of

Technology Stockholm, Sweden

[email protected]

ABSTRACT Collaborative jamming with digital musical instruments (DMI) exposes a need for output synchronization. While temporal solutions have been established, a better understanding of how live sound transformations could be balanced across instruments is required. In this work, a technology probe for reciprocal sound transformations was designed and developed by networking the instruments of four musicians and employing layered mapping between a shared interface, high-level sound attributes, and the sound synthesis parameters of each instrument. The probe was designed and used during a series of participatory design workshops, where seven high-level attributes were constructed according to the spectromorphology framework. The analysis, where the notion of sonic narrative and the concept of flow were applied, reveals how live controlling reciprocal sound transformations facilitates collaboration by supporting role-taking, motivating the ensemble, and directing the focus of its members. While generality of the implemented attributes cannot be claimed, challenges of the chosen mapping strategy and requirements for the user interface were identified.

Author Keywords Computer Supported Collaborative Work; Jamming; Layered Mapping; Open Sound Control; Spectromorphology; Internet of Musical Things.

INTRODUCTION The digital age has fostered a shift in both music distribution and production. The community on new interfaces for musical expression (NIME) researches the opportunities human-computer interaction and design provide for musicians. Emerging challenges introduced by collaborative jamming include collaborative live improvisation [1], where digital musical instruments (DMI) need to balance between computer support and the creative freedom of multiple players in real-time.

Music production not limited to the physical capabilities of acoustic instruments nor their users created the liberation of sound [2], expanding traditional approaches to composing with an infinite sound palette [3]. On the other hand, modeling musical structures has enabled performing operations, such as transposing the melody or fixing the tempo of recorded audio tracks, at the touch of a button with digital audio workstations (DAW) such as Ableton

Figure 1. Collaborative jamming with DMIs Live [4]. To utilize these features during live play, physical interfaces can be mapped [5] to trigger sounds from such software, varying from individual effects to full recorded tracks with synchronized timing and tuning. This separation between pre-programmed and improvised elements becomes essential in the design of live DMIs. The recorded tracks of different instruments can be automatically aligned to ensure an aesthetic collaboration: for example, MIDI clock synchronization [6] and Ableton Link [7] extend the temporal synchronization to all devices. On the other hand, the rich timbre and frequency domain of computer music [8], which has shifted the focus of composing from instruments, rhythms and melodies deeper into the sound spectra [9], provides powerful tools for improvised live control. However, because musicians need to deliberately select sound transformations according to all sounds being played, the mapping strategy of the physical interface and the sonic narrative [3], aligning them across different sound sources is far more complex than in the case of acoustic instruments. Thus, user-driven computer support could be facilitate the continuous balancing of improvised sound effects. This study investigates how layered mapping of continuous, high-level sound attributes across networked instruments can be used for accessible timbral balancing during collaborative jamming. Further, it discusses the impact controlling such attributes has on the improvisation process and how useful attributes could be generalized for all digital

Page 4: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

instruments. A technology probe [10] was co-designed in a participatory manner with four musicians during a series of jamming workshops as well as individual meetings and used during a live performance.

BACKGROUND In this section, previous inspiring explorations with networked instruments are introduced. Next, the concept of layered mapping and the potential of abstraction layers in reciprocal sound transformations across networked instruments are explained. For deriving abstraction layers, conceptualizing sound transformations with the spectromorphology framework and previously applied music descriptors are discussed. Finally, the design goals for a collaborative jamming support system are considered.

Networked instruments When microcomputers became available for consumers in the 1970’s, explorations with collaborative live jamming accelerated. Melodies played with one single-board computer were automatically be transposed according to key notes sent from another [11] and the live performance of the League of Automatic Music Composers and The Hub [11], [12] was coordinated, without planning or guidance, by narrowing down the movement space of different instruments with high-level parameters. Ideas introduced then, such as networking several computers into an electronic orchestra [11], are today adopted in academic laptop orchestras [13], [14], for which independent conducting software [12] has been developed. Some novel DMIs, such as the reacTable [15], are collaborative by design, yet require mastering interaction with a specific interface. The inconspicuous yet effective control of embedded systems provides tools for extending the functionality of any DMI towards an interconnected ensemble and further the internet of musical things (IoMusT) [16].

Layered mapping Simultaneously controlling the sound synthesis of diverse DMIs isn’t trivial. A live performance could include interfaces with a simple one-to-one mapping to embedded acoustic instruments [17] modeling acoustic sound production with cross-coupled controller parameters [5].

Hunt et al. [5] describe a layered mapping strategy, where the controller parameters of the interface aren’t directly linked to the sound synthesis parameters but processed through one or more abstraction layers. Thus, sound synthesis parameters such as volume and reverb could be classified under abstraction layer such as loudness and timbre. Notably, also the controller parameter values can be abstracted to meaningful attributes such as “distance between twisted knobs” or “relative pressure of pressed buttons”. When designing DMIs, these abstraction layers enable performing complex mappings in an understandable way and are crucial to the functionality of the instrument and the corresponding user experience. [5]

In this work, the abstraction layers are considered as a possibility to connect the sound effects of networked instruments despite differing controller and synthesis parameters. Little previous research has been done on what kind of high-level attributes would be useful for abstracting sound effects and how they should be controlled. An illustrative example would be a one-to-many mapping between a separate but shared interface and all connected instruments, whose audio operating systems could map the value of the high-level control parameter into one or more local synthesis parameter adjustments (Figure 2).

Figure 2. Layered mapping of OSC messages

One way of modifying sound digitally is with the novel Open Sound Control (OSC) protocol [18], supported by various multimedia devices [18] and used in modern DMIs including the reacTable [15]. Although it’s currently less common than, say, the MIDI (musical instrument digital interface) protocol, the human readable messages and customizable namespaces it supports are a developer friendly solution for building network applications and, perhaps more importantly, also understandable for the end users. As we will notice, this is useful when designing digital applications intended to be customized by musicians themselves.

Conceptualizing sound transformations Namely, all sound effects serve a purpose in music-making. To understand why musicians create and manipulate sound, the notion of sonic narrative [3] can be applied. According to extensive music cognition research, the human mind seeks to find meaning to all sensations of the body, making music highly representational [3]. Recognizing that sound transformations induce ideas and atmosphere in a matter of seconds becomes useful when categorizing them into usable live jamming tools. Questions concerning generalization are relevant: are sound transformations perceived consistently among individuals and in what way, or, can the abstraction layers of sound synthesis parameters be made objective and user-friendly at the same time?

Transcribing the narrative Electronic music challenges the traditional notation of tonal music which, to an extent, provides an objective way to describe music among acoustic instrumentalists. First, the notes don’t imply physical gestures of digital musicians. Secondly, focusing on the sound spectra within notes requires consideration of units smaller than the note itself

Page 5: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

[9]. In classical notation, the vocabulary presented in the Harvard Dictionary of Music [19] can be used to describe character and caricature, but in addition to being tied to a context defined by the instrument and the notes, its successful adoption has been found to depend on practical experience e.g. listening examples [20].

Character and caricature are also present in waveform manipulations and their combinations in a larger ensemble [21]. Indeed, Roads [3] argues that a single sound can entail a sonic narrative, also referred to as the micro-narrative [21]. When moving between and, to an increasing extent, within notes to create either harmonicity, inharmonicity or noise, the note should be recognised as a type of spectrum to share understanding on its function in musical context [9]. Further, means to describe digital sound manipulations between composers, performers and the audience are required. During jamming, musicians encapsulate each of these roles in an emergent manner, yet alignment among performers and their audience isn’t any less important. On the contrary, connecting musical intention and perception is crucial when designing real-time music interfaces to overcome the gulf of execution and evaluation [22].

Spectromorphology In 1997, the composer Denis Smalley addressed the above transcription issue in his paper [9] on spectromorphology, a framework for describing the listening experience of acousmatic music, where the sound sources aren’t identifiable. Since its comprehensive definition based on four decades of electroacoustic musical works, spectromorphology has been taught in academia [23] but also criticised for the lack of practical applications [24]. However, it’s not claimed to be a compositional method per se but a descriptive tool for “understanding the structural relations of acousmatic music” [9] i.e. conceptualizing electronic music.

Spectromorphology is concerned with how sounds are shaped over time to build expectation with motion and growth processes. Principles of aural perception apply to acousmatic sounds, including the energy-motion trajectory of sound: the listener links sounds to a physical gesture or ”adding energy to a sounding body” [9]. Electronic sounds can however be either gesture-carried or texture-carried, depending on whether (assumed) physical gesture frames the sound texture or vice versa. [9]

Energy and motion can be created with bidirectional growth and motion processes taking place in the spectral space, which defines the space between the lowest and highest audible sound. Emptiness and plenitude measure how extensively the space is filled and whether there are gaps between the areas different spectromorphologies occupy. Diffuseness and concentration describe whether sound is dispersed or focused on different areas within the space. The space can also be layered into streams and separating interstices between them. Overlap and crossover describe how these streams move around or across each other to

another region, which is central to collective sound transformations and the resulting micro-narrative [21]. [9] Dilation and contraction alter the sound width, whereas divergence and convergence could be textural growth as well as simultaneous linear movement. Endogeny grows sounds from within, while exogeny adds to the exterior. Motion rootedness describes how heavily the sound shape is rooted to a fundamental note: the seven listed characteristic motions ascend from “pushing” to “flying”. Rootedness could be used to build up tension for a pressured onset or to imply termination, either by fading the motion “in the air” or again, bringing it back to the ground. Many digital sound effects, such as reverb, are inherently non-rooted. The pitch, regardless of being present, may not be perceived due to the overlap or density of sounds. Thus, a distinction between relative and intervallic pitch is made: the former hinders distinguishing intervals while the latter supports their cultural and tonal usage. [9]

Audio descriptors The CUIDADO [25] project generalized high-level sound content descriptors for audio signals [26], similarly to the MPEG-7 standardization [27]. Other research has analyzed corresponding descriptors in their narrative context, indicating consistency in the representation of emotional states [28] and physical quantities [29]. A case study [30] examined how the emotional intention of professional musicians was conveyed when performing short melodies to an audience. The impact on physical characteristics including dynamics and spectrum was analyzed. In most cases, the intended and perceived emotion matched and many similarities between moods and all analysed variables were found. However, not enough supportive evidence was found to claim generality across all instruments, genres and individuals. A more recent application [31] mapped music pieces into mood development curves in the Arousal-Valence space (Figure 3).

Figure 3. The Arousal-Valence space [28]

The example dataset was constructed by averaging the Arousal-Valence coordinates given by five music professionals to 324 short music fragments. A satisfactory correlation coefficient value indicated that with such model,

Page 6: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

different subjects – albeit with a similar cultural background – perceived similar emotions in music samples across genres. However, the study didn’t analyze separate audio variables.

Another study reviewed 179 scientific publications on digital mapping strategies for the sonification of physical quantities [29]. Using a bottom-up approach, auditory and physical dimensions were classified into five categories: for the auditory dimensions these were pitch-related, timbral, loudness-related, spatial and temporal. Though pitch was the most commonly applied auditory dimension, relations between physical and auditory categories were found: in “ecological mapping” sounds simulated the underlying physical phenomena. For example, spatial auditory dimensions, e.g. multichannel panning and interaural difference of amplitude, were mainly used to express kinematic quantities such as motion and acceleration. Despite the found regularities, the evaluation of each mapping strategy within the studied literature was argued to be insufficient. [29]

Computer supported collaborative jamming The characteristics of jamming outline important design goals for the intended computer support system and its design process. Live collaboration with DMIs has the potential to foster musical companionship [32], group creativity [33] and learning [34]. A previous study on computer supported jamming underlines the importance of the felt experience [35] where ideally, a flow state where “thoughts, feelings, wishes and action are in harmony” [36] and “subjective experience is both differentiated and integrated” [36] encourages musicians to return to and persist in a task [37].

Automated control can bring structure to the constant improvisation [38], which characterizes group creativity along with collaboration and emergence: a single player can’t be responsible for the creativity of the group whose result is “unpredictable, contingent and hard to explain in terms of the group’s components” [33]. While the tendency to attribute group creativity to one person increases when collaboration is coordinated by a single leader [33], equitable computer support could enable emergent role-taking. Research on collaborative digital music making recognises the ability to computationally affect the output of other musicians as a challenge for personal expression or even privacy [39], which should be considered when designing such systems. However, it has also been argued that artists need to be able to let go of full control and allow for more open forms in articulating their vision [40].

THE TECHNOLOGY The solution consisted of Python scripts run on four Elk boards [41], one for each instrument. Elk is a headless audio operating system providing a custom add-on board (Figure 4) for Raspberry Pi [42]. The hardware components of the board included inputs for both audio jacks and MIDI instruments. For live jamming, the low, one millisecond round-trip latency was necessary [43] and the possibility to

connect boards over a wireless network enabled building a collaborative system. The mapping of the customized, high-level OSC messages sent from a shared Open Stage Control [44] interface was done through the scripts sending local OSC messages to the corresponding sound source. The customized OSC messages could be sent to all boards by IP multicasting over a wireless network.

The Audio OS was configured to play and transform sounds from plugins chosen for each instrument with a JSON file. Open-source plugins were used to add audio effects for all instruments and sound presets for MIDI signal: the OB-Xd virtual analog synthesizer and the juicysfplugin [45] with self-constructed soundfont consisting of drum samples.

Figure 4. The Elk board THE STUDY Over the course of two months, four musicians participated in three 4-hour workshops and one individual meeting at a music studio in Stockholm, Sweden. The study followed participatory design principles [46] and the goal was to enhance the collaborative realization of a sonic narrative during jamming with reciprocal digital sound transformations. Thus, the users were physically involved in the development process of the final technology probe [10]. Technology probes are simple yet flexible technologies which enable understanding the user needs in a real-world setting, testing the technology and innovating new technologies [10]. All workshops consisted of improvised jamming where the user participation enabled acquiring tacit knowledge beyond questions asked during the following group discussions [46]. The musicians were aged between 21 and 32 and their instruments were electric guitar, electric bass and two MIDI keyboards for synthesiser (Novation 61SL MkIII) and drums (Arturia MiniLAB MkII). The first workshop approached jamming as bodystorming [47] opportunities with the Elk board, while the second one explored the conditions of them with the Wizard of Oz method [48]. After personalizing the solution for each player during individual meetings, the final technology probe was used in the third workshop. By recruiting musicians with no shared jamming experience, any

Page 7: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

established routine and roles in communication could be eliminated, while the diverse control mechanisms of the instruments validated an inclusive design. Due to the outbreak of Covid-19 in Sweden during the study, precautions for increased hygiene and safety of the participants were taken. Data was collected and analyzed by video recording the jamming sessions. Discussions and first person experiences were documented as field notes and audio recordings. Analysis based on each iteration was applied in the design and brought forward to the following workshop. Bodystorming During the first workshop, digital effects weren’t used during collaborative jamming but the musicians were introduced to the concept of sonic narrative. Exercises on how the musicians interacted with conflicting and shared motives, in this case different moods, was explored. Afterwards, the group discussed digital sound effects, their affective associations based on previous experience and used the Arousal-Valence space for expressing them. After the group discussion, the first prototype made with two Elk boards was presented. With a mobile web interface, the musicians could simultaneously alter the delay of a guitar and a synth oscillator by twisting a knob object. While taking turns trying the prototype, they could ideate use-cases and identify possible challenges.

Controlling and combining sound transformations During the second workshop, all instruments were connected to Elk boards with a set of sound effects, controlled with either physical knobs and sliders of the Elk board (synth) or a mobile interface representing them (drums, bass and guitar). Improvised jamming was facilitated by providing the musicians with prompts such as “an imagined experience” from PLEX cards [49] designed for playful and collaborative ideation. The goal was to identify useful high-level attributes in a bottom-up manner based on successful combinations of sound effects. Next, the Wizard of Oz method was used to model the automatic balancing of effects by monitoring effect adjustments of musicians via a laptop interface and immediately altering those of other instruments. Different approaches to implement the high-level attributes were realized in a top-down manner, including the growth and motion processes of spectromorphology, moods, physical quantities and even PLEX. The most useful approach and different control mechanisms were discussed afterwards. Mapping the effects During the individual meetings, the effect plugins for each instrument were updated to better respond to the desires of the musicians. They could try out different combinations as the developer would edit the configuration file and restart the audio processor multiple times. Once a satisfactory plugin selection and an optimized effect chain had been achieved, the high-level attributes were considered one by one. The final attributes were selected based on how well they could be implemented with the effects. The most

fitting sound synthesis parameters and how they correspond to the high-level attribute were identified during improvised play and recorded as field notes. Afterwards, the script for each board could be programmed to perform the mapping. Three of the individual meetings were held at the studio and one was split into two remote meetings of around 30 minutes. The multi-effect probe The final workshop consisted of using the built technology probe during an improvised performance. As preparation, the group ideated a narrative to guide the performance by drawing four arbitrary PLEX cards. Based on the experience from the previous workshop, only one laptop interface was used for controlling the high-level attributes instead of multiple mobile phones. During the performance, the musicians took the role of the controller based on who had drawn the card guiding the current part of the story. Individual sound transformations weren’t used to eliminate any effect outside the high-level attributes. After finishing the first improvised performance, the experience was discussed. Finally, the musicians returned back to the studio to experiment with the system and different control strategies to make additional remarks.

CRAFTING THE PROBE Defining the attributes One of the main challenges was defining high-level attributes that would be beneficial during improvised jamming, possible to implement with all instruments and continuously adjustable. While the second workshop was dedicated to exploring possibilities, the unfamiliarity of the available effects, the time limit and a few technical setbacks restricted the options. During play, the musicians would first tweak their own effects and afterwards start listening to others. Pleasant effect combinations were found, for example when playing to “an imagined experience” with a high cutoff, low frequency and a delayed attack for the drums, high resonance and cutoff for the synth, increased compressor and reverb for the bass and high muffle for the guitar. However, since the focus was on the end result rather than the process of transformation, deriving continuous attributes from these combinations didn’t make sense.

The top-down approach guided the definition of more generic sound attributes. Moods were, in the end, considered both too vague to be mapped to specific effect combinations and too dependant on other aspects of the music. Physical quantities, on the other hand, were more absolute and thus less usable, because they wouldn’t allow for much contrast between instruments. Thus, the terminology provided by spectromorphology was selected as the most suitable approach for controlling sound effects on a higher level. Even though the terms required the most explanation, a shared understanding of the underlying processes guided the musicians to treat sound transformations as compositional tools rather than discrete settings. The final attributes were narrowed down during

Page 8: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

the individual meetings based on how well they could be implemented, although liberties were taken. In total, seven continuous high-level attributes with possible values between 0.0 and 1.0 were mapped across instruments: gesture framing adjusting the sound color of gesture-carried sounds, texture setting adjusting the textural contour of sounds, contraction-dilation adjusting the width of sounds, motion rootedness adjusting the weight of sounds, pitch varying between relative and intervallic, endogeny-exogeny applying effects within or between sounds and divergence-convergence adjusting the contrast of timbre across instruments. The goal was that the adjustment of each high-level attribute would represent the

corresponding process, depending on the direction. In practice, the mapping of certain high-level attributes overlapped due to the limited amount of available effects per instrument. The mapping of the high-level attributes across instruments is shown in Table 1, where extracts of the OSC messages related to sound synthesis parameters are written as such. Positive values indicate that the adjustment is directly proportional to that of the parent attribute, whereas the adjustment of negative values is inversely proportional. The optional range defines the minimum and maximum values for the related sound synthesis parameters, when adjusting the high-level attribute.

Drums Bass Guitar Synth

Gesture + /juicy/low-pass_filter_ cut-off_requenc + /juicy/low-pass_filter_ resonance_ attent (0.5)

+ /reverb/DryWet_Mix + /phaser/Bypass

+ /guitarix/Treble (0.3 - 1.0) +/obxd/Resonance

Texture

+ /juicy/volume_envelope_ release_time + /reverb/Mix

- /phaser/Bypass + /phaser/Speed +/phaser/Feedback_Gain (0.9) +/reverb/DryWet_Mix (0.4-0.6)

- /crybaby/Bypass - /guitarix/Treble + /reverb/Mix + /reverb/Predelay + /guitarix/drive

- /bitcrush/resolution +/parameter/obxd/Portamento

Dilation + /reverb/Mix (0.0 - 0.8) + /reverb/Decay

- /reverb/DryWet_Mix + /reverb/In_Delay (0.0 - 0.8) + /reverb/Low_RT60 + /reverb/Mid_RT60 + /reverb/Eq1_Level + /reverb/Eq2_Level

+ /reverb/Mix + /guitarix/Treble (0.2- 1.0) + /crybaby/Wah_parameter

+ /reverb/Bandwidth + /reverb/Size + /reverb/Density + /reverb/Mix

Rootedness - /eq/Low + /reverb/Decay

- /compressor/Threshold (0.5 - 1.0) - /reverb/DryWet_Mix (0.2-0.5) + /phaser/Bypass + /reverb/Low_RT60 (0.0-0.5) + /reverb/Mid_RT60 (0.0-0.5) + /reverb/Eq1_Level + /reverb/Eq2_Level

-/guitarix/Treble (0.0-0.5) - /guitarix/Middle (0.0 - 0.5) - /guitarix/drive - /crybaby/Bypass (0.5 - 1.0) + /crybaby/Wah_parameter + /reverb/Mix

- /bitcrush/resolution - /obxd/Resonance + /reverb/Mix + /reverb/Decay

Pitch - /eq/Low

- /phaser/Invert_Internal_ Phaser_Sum - /phaser/Vibrato_Mode - /reverb/In_Delay (0.0-0.8) + /reverb/Eq2_Level + /phaser/Speed (0.3 - 0.5)

- /guitarix/drive - /obxd/Xmod - /obxd/NoiseMix

Exogeny

+ /eq/Low (0.5-1.0) + /reverb/Mix + /reverb/Density + /reverb/Decay + /reverb/Predelay

- /phaser/Invert_Internal_ Phaser_Sum - /reverb/DryWet_Mix

- /crybaby/Wah_parameter - /guitarix/Treble

+ /obxd/Osc1Pitch

Convergence

+ /juicy/volume_envelope_ attack_time (0.0 - 0.3) + /juicy/volume_envelope_ sustain_attentua (0.0 - 0.5) + /juicy/volume_envelope_ decay_time (0.0 - 0.5) + /juicy/volume_envelope _release_time (0.0 - 0.5) + /reverb/Mix (0.0 - 0.4)

- /compressor/Threshold (0.7 - 1.0) +/parameter/reverb/DryWet_Mix (0.5)

- /crybaby/Wah_parameter - /crybaby/Bypass - /guitarix/drive + /reverb/Mix

- /obxd/VoiceDetune - /reverb/Bandwidth + /reverb/Mix

Table 1. Mapping of the high-level attributes

Page 9: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

Because of the overlapping effects of some attributes, it was found cumbersome that the GUI didn’t match the current state of sound synthesis at all times: for example, decreasing texture setting would decrease reverb of the drums, yet dilation, also responsible for reverb, would appear to remain at the highest value. However, the perceived jumps within single synthesis parameters remained relatively subtle and the ability to override current sounds with another attribute also guaranteed that each adjustment had an impact. Even though the attributes were found “fluid in transitioning” according to the synth player, the drummer would’ve hoped for “a reset point as some idle state of all the attributes”.

Some attributes shared opinions among the musicians: “I felt a bit uncomfortable with convergence. I lost all my kick (attack) so I felt like missing a part of my instrument”, said the drummer. In addition, the naming of some attributes was counterintuitive: motion rootedness would decrease as its value increased, because initially the slider represented characteristic motions from “dragging” to “flying”. Some of the attributes were more useful than others, either due to more interesting or too similar implementation. Each musician could identify their favorite attribute after the performance, which where dilation-convergence for the bassist, endogeny-exogeny for the synth player and texture setting for both the drummer and the guitarist.

Controlling the attributes Controlling the effects was found complicated during the second workshop, where many instances of the same interface were used with mobile phones: several people interacting at the same time caused technical issues as well as confusion in role-taking. A single interface, where one player at a time has the control, was suggested for the final performance. Open Stage Control was chosen over a more physical interface, such as one of the Elk boards, as it enabled faster mapping on the go and using a laptop touchpad with a self-explanatory GUI (Figure 5). While role-taking was indeed clarified, accessing the laptop wasn’t easy: the guitarist preferred the attributes being controlled

Figure 5. Sliders for controlling the high-level attributes by someone else for this reason. He also “would’ve needed more time to get to know the attributes”, saying: “When it was my time to control I was really searching the options”. The drummer said: “Focusing on playing was easier when someone else was controlling, but I really enjoyed having the control, it’s kind of like having another instrument”, to which the bassist concurred. The synth player “couldn’t

always find what he wanted”, but also pointed out that “not being able to see what effects were in place made adapting a bit more difficult”. Thus, in addition to the laptop, mobile interfaces were brought back after the final performance for further exploration (Figure 6). The simplicity of the user interface raised ideas such as a third party, i.e. the audience, being able to control the effects. On the other hand, automating the attribute control during a non-improvised performance in sync with lights or other visual elements was suggested.

Figure 6. Controlling the attributes with two interfaces Realizing the sonic narrative Digital sound effects were found very useful for creating narratives already during the first workshop, where none was used, with the exception of the synth player’s pitch modulation pad. All musicians would’ve preferred accessing sound effects for increased expressivity. The synth player explained: “I used the pitch modulation to express silliness and repetition for intense. Cheerful was an octave higher.” When evaluating the affective content of sound manipulations based on previous experience, the bass player thought: “something silly would be like an envelope filter or a wah (pedal), if you want to do something more melancholic you use a chorus”. When trying out sound effects during the second workshop, they strongly guided the jamming of individual players. This hindered collaboration, because each musician prioritized adapting their jamming to the effect in place over listening to others. After using the technology probe during an improvised performance in the third workshop, the drummer described the experience: “The high-level attributes give possibilities to try something different. When I was in control there were new ways of changing the discourse of the music. Affecting everyone with one knob definitely makes you focus more on the others. The difference is that you start listening as a whole instead of every single instrument separately, like how everything sounds together. This is different from traditional jamming. It turns everything into one.” At the same time, he noted that “the (planned) sonic narrative was very helpful because it guided them to look for something,

Page 10: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

not just to explore”. All of the players agreed on that “it was relatively easy to adapt (to the effects)”. Thus, controlling the high-level attributes significantly improved focusing on the realization of a shared sonic narrative, strongly motivated by the collective sound transformations.

Finding the flow Reaching a flow-like state, or a collective groove, was an ongoing theme throughout the workshop, essentially used to measure the user experience of the system. During the first workshop, it was found that conflicting motives hindered the attention of the players, resulting in discouragement: when each player was trying to convey their own moods, the guitarist “felt so stupid when nobody was following him” but was able to “lose himself” when the group was presented with a shared motive. During the second workshop, sound transformations and effects were confirmed to serve as motives for individual players, and thus caused similar conflicts when not balanced. In addition, having to control separate parameters was laborious: the bassist explained how “having to remember a previously found effect combination even for one instrument was difficult and looking for it frustrating”.

Regarding this evaluation criteria, all musicians agreed on the performance with the probe being the most successful by far. The players were the most immersed during play and the most pleased with both the communication and the end result. The bassist shared the data collector’s observation on “how the rhythms, the melodies, the dynamics and the timbre were in harmony”. Connecting sound effects with collective motion and growth processes was seen as a central central contributor to this.

Figure 7. Jamming with the technology probe DISCUSSION The niche yet complex design scenario of controlling reciprocal sound effects was examined using a flexible technology probe during collaborative jamming with DMIs. Several user needs for controlling sound effects with shared, high-level sound attributes were identified with participatory design principles. Three research questions were addressed:

● How can high-level sound attributes provide an accessible balance across sound effects during collaborative jamming?

● How controlling such attributes impacts the performance?

● How could useful attributes be generalized for all digital instruments?

Accessible balance across sound effects was achieved by mapping OSC messages in multiple layers. The values of the high-level attributes were represented with customizable OSC messages and sent from a shared interface to four Elk boards, processing the sound of each instrument, with IP multicasting. Within each board, the continuous value adjustments of the high-level attributes were mapped into a set of local OSC messages adjusting sound synthesis parameters. Each local mapping was separately defined and not necessarily directly proportional to that of the parent value. In total, seven continuous high-level attributes with values between 0.0 and 1.0 could be controlled with sliders on a laptop or mobile phone. One of the main findings was to approach the effect adjustments themselves as constant sonic motion and growth processes rather than solely focus on the static effect combinations. The impact of controlling shared attributes was evident, as not only sound but also the motives of the ensemble were unified. The finding that timbral and dynamic transformations effectively inspired the creation of melodies and rhythms caused difficulties during the second workshop, where sound manipulations were independent from each other. Listening to other players was secondary, as musicians focused on matching their playing with the digitally processed mirror image [8] of it. The bottom-up categorization of auditory dimensions [29], where six of them belonged to several categories, for example harmony being both pitch-related and timbral, supports the idea that timbral adjustment ties into melody creation. However, when effects were automatically aligned, the focus during their adjustment shifted from individual instruments to the ensemble as a whole. After the players started listening to the music in a different way, they could better find their role as a part of a larger entity and realize that through playing. This phenomenon addresses the structural levels described by Smalley [9]: for a pleasant listening experience, the sound hierarchies within a work should vary and fluctuation between low-level and high-level interest occur [9]. Controlling sound effects with the high-level sound attributes coordinated this variation by collectively directing the attention of the musicians. Allowing only one performer to control the attributes at a time, and clearly indicating whose turn it was, structured the jamming process and fostered group creativity [33]. Delegating the control of the high-level attributes to one member gave him more expressivity, while the other musicians could better focus on jamming. Moreover, switching the role of the controller served as an important communication cue [33] regarding the sonic narrative. The

Page 11: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

musicians found the interplay between the collaboratively ideated sonic narrative and altering the discourse of the music with the different attributes seamless. To define general high-level sound attributes, the group opted for a top-down approach to derive the instrument-related sound effects, applying the spectromorphology framework. Spectromorphology was chosen over other considered options, because it combines natural language and the properties of sound manipulations with a compositional context – building expectation with constant sonic motion and growth processes. While the selected processes, explained in Smalley’s paper [9], were somewhat flexible in terms of implementation, their generality could be questioned for the same reason. Moreover, the ambiguity of natural language, the mapping strategy and the partly arbitrary sound effects may have resulted in misinterpretation of the terms. Another study [23], where YouTube comments to a piece of electronic dance music were reflected upon Smalley’s models of spectromorphology and space-form [50], addressed the same issue, but also suggested that focusing on how sound shapes evoke emotions should be emphasized over the highly subjective content of the private experiences. Conjuring “affective alliances” between different players may then be enhanced with a contextual terminology rather than one that depends on a fixed vantage point [23]. Similarly, abstracting sound attributes with fixed sound effects for all instruments and users might not be reasonable to ensure creative and personal storytelling. However, a shared understanding of the intended purpose of the sound transformations most likely contributed to the coordination and positive outcome of the final performance. Based on the experiences gained from managing the musicians during the workshops, benchmarks that inspire the end users and facilitate their tasks when customizing transformations for their own needs are useful. In practice, including abstraction layers at different levels, where some are customizable and others objectively deductible, could be useful when mapping collective sound effects. Possible categories include include spectral (derived from e.g. waveform graphs), functional (for building expectation or transitioning) and physical (quantities imposed on the interface). Naturally, the user interface for performing the mapping should clearly indicate the level and type of each layer, for either cross-coupling or differentiating between their control during live play. Based on this study, non-traditional ways to conceptualize music are useful if not necessary for collaborative live control of sound effects. Limitations During the study, each musician was only responsible for playing one “track” to impose effects on, even though single DMIs can allow for more. In addition, no local effect adjustments were used while controlling the high-level

attributes. Considering these aspects would complicate the mapping process. During the final workshop, only one mapping strategy could be explored with the technology probe. Finally, jamming experience gained from the first two workshops may have enhanced the positive outcome when using the technology probe. Future work Designing the interaction with the suggested control system requires careful consideration. It should be better integrated into playing but also clearly indicate role-taking. For facilitating the effect mapping, structuring available audio effects according to different metrics should continue. Reviewing studies, such as the CUIDADO project [26], more extensively can be used to construct comprehensive taxonomies of digital sound transformations. Finally, the interface of the system should be made responsive with two-way communication between the boards and the control objects. This is especially important, if local sound transformations are combined with the collective ones. CONCLUSION In this work, a technology probe for controlling reciprocal sound effects across digital musical instruments with dynamic high-level sound attributes was co-designed with four musicians. As a result of participatory design workshops, seven continuous high-level attributes were constructed drawing upon the processes of the spectromorphology framework, implemented with layered mapping, and used during collaborative live jamming. The workshops shed light on how jamming is shaped by digital sound transformations and why connecting them in a collaborative setting can support group creativity. Reciprocal sound transformations coordinated the jamming towards a congruent sonic narrative and an immersive experience by providing shared motives for the ensemble, shifting the focus of the musicians between the structural levels of the music and enabling role-taking as a communication cue. While deriving the instrument-related sound synthesis of each high-level attribute was restricted to a single use case and thus more intuitive than systematic, the results indicated that overlapping attributes were reliable and expressive, yet their usability suffered from a non-responsive user interface. For a seamless interaction, clearly allocating and effortlessly accessing the control of the attributes was found crucial. Through the analysis, it could be concluded that transparency and a mutual understanding of how sounds were being shaped significantly increased the usability of sound effects during jamming and that systematic categorization of sound transformations is likely to facilitate the customization of useful high-level sound attributes. ACKNOWLEDGMENTS I thank all the musicians, who actively took part in this project, and my supervisors, Anders Lundström and Ilias Bergström, who provided continuous support and valuable feedback on previous versions of this document.

Page 12: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

REFERENCES [1] H. E. Tez and N. Bryan-Kinns, “Exploring the

Effect of Interface Constraints on Live Collaborative Music Improvisation,” NIME, p. 6, 2017.

[2] E. Varèse and C. Wen-chung, “The Liberation of Sound,” Perspect. New Music, vol. 5, no. 1, pp. 11–19, 1966, doi: 10.2307/832385.

[3] C. Roads, Composing Electronic Music: A New Aesthetic. Oxford University Press, 2015.

[4] “Créer de la musique avec Live et Push | Ableton.” https://www.ableton.com/ (accessed May 25, 2020).

[5] A. Hunt, M. M. Wanderley, and M. Paradis, “The Importance of Parameter Mapping in Electronic Instrument Design,” J. New Music Res., vol. 32, no. 4, pp. 429–440, Dec. 2003, doi: 10.1076/jnmr.32.4.429.18853.

[6] I. J. C. Tobias and M. L. Denman, “System for synchronizing a midi presentation with presentations generated by other multimedia streams by means of clock objects,” US5530859A, Jun. 25, 1996.

[7] “Ableton Link: Connect music making apps with Ableton Live | Ableton.” https://www.ableton.com/en/link/ (accessed Mar. 06, 2020).

[8] “Combining the Acoustic and the Digital: Music for Instruments and Computers or Prerecorded Sound - Oxford Handbooks.” https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199792030.001.0001/oxfordhb-9780199792030-e-009 (accessed Mar. 29, 2020).

[9] D. Smalley, “Spectromorphology: explaining sound-shapes,” Organised Sound, vol. 2, no. 2, pp. 107–126, Aug. 1997, doi: 10.1017/S1355771897009059.

[10]H. Hutchinson et al., “Technology probes: inspiring design for and with families,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Ft. Lauderdale, Florida, USA, Apr. 2003, pp. 17–24, doi: 10.1145/642611.642616.

[11]“INDIGENOUS TO THE NET.” http://crossfade.walkerart.org/brownbischoff/IndigenoustotheNetPrint.html (accessed Mar. 25, 2020).

[12]J. M. Comajuncosas and E. Guaus, “Conducting Collective Instruments : A Case

Study,” NIME, p. 4, 2014. [13]D. Trueman, “Why a laptop orchestra?,”

Organised Sound, vol. 12, no. 2, pp. 171–179, Aug. 2007, doi: 10.1017/S135577180700180X.

[14]G. Wang, N. Bryan, J. Oh, and R. Hamilton, “Stanford Laptop Orchestra (SLOrk),” p. 5, 2009.

[15]M. Kaltenbrunner, S. Jorda, G. Geiger, and M. Alonso, “The reacTable*: A Collaborative Musical Instrument,” in 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE’06), Manchester, UK, 2006, pp. 406–411, doi: 10.1109/WETICE.2006.68.

[16]L. Turchet, C. Fischione, G. Essl, D. Keller, and M. Barthet, “Internet of Musical Things: Vision and Challenges,” IEEE Access, vol. 6, pp. 61994–62017, 2018, doi: 10.1109/ACCESS.2018.2872625.

[17]M. Blessing and E. Berdahl, “The JoyStyx: A Quartet of Embedded Acoustic Instruments,” NIME, p. 4, 2017.

[18]M. Wright, “Open Sound Control: an enabling technology for musical networking,” Organised Sound, vol. 10, no. 3, pp. 193–200, Dec. 2005, doi: 10.1017/S1355771805000932.

[19]W. Apel, The Harvard Dictionary of Music. Harvard University Press, 2003.

[20]J. W. Cassidy and D. R. Speer, “Music Terminology: A Transfer from Knowledge to Practical Use,” Bull. Counc. Res. Music Educ., no. 106, pp. 11–21, 1990.

[21]M. Back and D. Des, “Micro-narratives in sound design: Context, character, and caricature in waveform manipulation,” Nov. 1996, Accessed: Apr. 03, 2020. [Online]. Available: https://smartech.gatech.edu/handle/1853/50810.

[22]E. L. Hutchins, J. D. Hollan, and D. A. Norman, “Direct Manipulation Interfaces,” Human–Computer Interact., vol. 1, no. 4, pp. 311–338, Dec. 1985, doi: 10.1207/s15327051hci0104_2.

[23]E. K. Spencer, “Re-orientating Spectromorphology and Space-form through a Hybrid Acoustemology,” Organised Sound, vol. 22, no. 3, pp. 324–335, Dec. 2017, doi: 10.1017/S1355771817000486.

Page 13: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

[24]J. Young, “Sound in structure: Applying spectromorphological concepts.,” 2005, Accessed: May 14, 2020. [Online]. Available: https://dora.dmu.ac.uk/handle/2086/4756.

[25]H. Vinet, P. Herrera, and F. Pachet, “The CUIDADO Project,” in International Conference on Music Information Retrieval, Paris, France, Oct. 2002, pp. 197–203, Accessed: Jun. 18, 2020. [Online]. Available: https://hal.archives-ouvertes.fr/hal-01250799.

[26]G. Peeters, “A large set of audio features for sound description (similarity and classification) in the CUIDADO project,” Jan. 2004.

[27]B. S. Manjunath, P. Salembier, and T. Sikora, Introduction to MPEG-7: Multimedia Content Description Interface. John Wiley & Sons, 2002.

[28]Y. E. Kim et al., “MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW,” p. 12, 2010.

[29]G. Dubus and R. Bresin, “A Systematic Review of Mapping Strategies for the Sonification of Physical Quantities,” PLOS ONE, vol. 8, no. 12, p. e82491, Dec. 2013, doi: 10.1371/journal.pone.0082491.

[30]A. Gabrielsson and P. N. Juslin, “Emotional Expression in Music Performance: Between the Performer’s Intention and the Listener’s Experience,” Psychol. Music, vol. 24, no. 1, pp. 68–91, Apr. 1996, doi: 10.1177/0305735696241007.

[31]J. Grekow, “Music Emotion Maps in Arousal-Valence Space,” in Computer Information Systems and Industrial Management, vol. 9842, K. Saeed and W. Homenda, Eds. Cham: Springer International Publishing, 2016, pp. 697–706.

[32]P. of M. P. and I. R. MacDonald and R. M. Donald, Musical Communication. Oxford University Press, 2005.

[33]R. K. Sawyer, “Group creativity: musical performance and collaboration,” Psychol. Music, vol. 34, no. 2, pp. 148–165, Apr. 2006, doi: 10.1177/0305735606061850.

[34]M. Biasutti and L. Frezza, “Dimensions of Music Improvisation,” Creat. Res. J., vol. 21, no. 2–3, pp. 232–242, May 2009, doi: 10.1080/10400410902861240.

[35]B. Swift, “Chasing a Feeling: Experience in Computer Supported Jamming,” in Music and Human-Computer Interaction, S. Holland, K.

Wilkie, P. Mulholland, and A. Seago, Eds. London: Springer London, 2013, pp. 85–99.

[36]C. R. Snyder and S. J. Lopez, Oxford Handbook of Positive Psychology. Oxford University Press, 2009.

[37]L. A. Custodero, “Seeking Challenge, Finding Skill: Flow Experience and Music Education,” Arts Educ. Policy Rev., vol. 103, no. 3, pp. 3–9, Jan. 2002, doi: 10.1080/10632910209600288.

[38]B. E. Benson, The Improvisation of Musical Dialogue: A Phenomenology of Music, 1st ed. Cambridge University Press, 2003.

[39]R. Fencott and N. Bryan-Kinns, “Computer Musicking: HCI, CSCW and Collaborative Digital Musical Interaction,” in Music and Human-Computer Interaction, S. Holland, K. Wilkie, P. Mulholland, and A. Seago, Eds. London: Springer, 2013, pp. 189–205.

[40]“Facilitating collective musical creativity | Proceedings of the 13th annual ACM international conference on Multimedia.” https://dl.acm.org/doi/abs/10.1145/1101149.1101177 (accessed Apr. 01, 2020).

[41]“Elk new website,” Elk Audio OS. https://elk.audio/ (accessed Mar. 21, 2020).

[42]“Teach, Learn, and Make with Raspberry Pi – Raspberry Pi.” https://www.raspberrypi.org (accessed Mar. 21, 2020).

[43]R. H. Jack, T. Stockman, and A. McPherson, “Effect of latency on performer interaction and subjective quality assessment of a digital musical instrument,” in Proceedings of the Audio Mostly 2016, Norrköping, Sweden, Oct. 2016, pp. 116–123, doi: 10.1145/2986416.2986428.

[44]“Open Stage Control.” https://openstagecontrol.ammd.net/ (accessed Mar. 23, 2020).

[45]Birch-san, Birch-san/juicysfplugin. 2020. [46]L. Elblaus, K. F. Hansen, and C.

Unander-Scharin, “Artistically Directed Prototyping in Development and in Practice,” J. New Music Res., vol. 41, no. 4, pp. 377–387, Dec. 2012, doi: 10.1080/09298215.2012.738233.

[47]D. Schleicher, P. Jones, and O. Kachur, “Bodystorming as embodied designing,” Interactions, vol. 17, no. 6, pp. 47–51, Nov. 2010, doi: 10.1145/1865245.1865256.

[48]S. Dow, B. MacIntyre, J. Lee, C. Oezbek, J. D. Bolter, and M. Gandy, “Wizard of Oz

Page 15: Reciprocal sound transformations for computer supported …1466064/... · 2020. 9. 10. · Reciprocal sound transformations for computer supported collaborative jamming Roosa Kallionpää

TRITA-EECS-EX-2020:429

www.kth.se