Interactive Sound
Generative approaches from computation and cognitive sciences
How can we usefully structure real-time interactive music generation systems? I am considering here two kinds of situation in which many of us practice music-making. First, the juxtaposition of acoustic instruments (the piano in my own case) with computational sound generation processes. Second, purely electronic computer-interactive performance. In the first case, a key feature is that musical strands generated or realized by a performing musician feed into the performance, and potentially to the computer-interactive system. In the second case, a musician can develop substantial facility with the computer-interactive interface itself, using gesture, controllers or voice, but only with the voice will their input be itself a musical stream, and one which can contribute immediately to the overall sound stream. 1[1. See, for example, the works of Pamela Z discussed in (Lewis 2007) and Stefano Fasciani’s article in this Issue of eContact!, “’One at a Time by Voice’: Performing with the voice-controlled interface for digital musical instruments.”]
In both situations, many real-time computational approaches are applicable; and a key distinction seems to me, as just implied, to be the nature of the input stream that the performer provides: whether primarily musical or a stream of generative or controller information. After the stage of musical stream input, the opportunities of the two situations are essentially identical, and these opportunities are my topic. So I focus on algorithms which may be used in acoustic, purely electronic or computational, or hybrid circumstance. I consider in particular “live algorithms” those whose path can be perturbed in flight, for example those found in Oliver Bown’s Zamyatin software system (Bown 2011). One might say, those algorithms with chinks in their black-box armour. I make some contrasts with live coding, where there is at the outset generally little or no armour, no carapace, rather a jelly of wobbling potential.
So my emphasis is on real-time interactive work, but many if not most of the concepts discussed can equally be used for compositional processes out-of-hearing from anyone other than the creator(s), which are edited or extended before presentation to an audience, or are intended for recorded presentation only.
Traditional Algorithmic Approaches to Music Creation
The Realization of Mathematical Formulations as Music
The pages of the prestigious Computer Music Journal are strewn with articles on specific mathematical ideas that have been embodied in music: for example, power functions, fractals, chaotic functions and mathematics derived from kinematics or other aspects of physics and organismal movement. Gerhard Nierhaus’ recent book on algorithmic composition (2009) provides a recent extensive review of such algorithmic approaches, though he seems to conclude that they may be restricted in impact and/or longevity. On the other hand, Andrew Brown and Andrew Sorensen indicate that they regularly find a particular set of algorithms valuable in their generative work in live coding: “probability, linear and higher order polynomials, periodic functions and modular arithmetic, set and graph theory, and recursion and iteration” (Brown and Sorensen 2009, 25). Of particular note amongst the many contributions to the field over the past 25 years or so are those by Jeff Pressing, whose appreciation of motor and cognitive science issues always informed the way in which he applied the algorithms to make music (Pressing 1990), foreshadowing the more recent approaches I have turned to, as we will see below.
Embodying a Compositional Approach That Is Not Explicitly Mathematical in an Empirically-Derived Algorithm
This endeavour has a long history, including the work contributed by people such as Kemal Ebcioğlu and David Cope. Both created models of prior music, but had to extract or formulate the model and then implement it for generative use. Ebcioğlu formulated a probabilistic mechanism to generate harmony sequences allied to those of Bach Chorales (Ebcioğlu 1988). Cope also sought to recreate styles of earlier composers, from Mozart to Prokofiev and Joplin, by statistical transitions (Cope 2001). A published patent of his also seeks “retrograde recombination”, in which patterns are recombined, normally in reverse, to generate a kind of remix, a new piece, mainly focused on pitch and rhythmic structure. This is of course somewhat like mosaicing 2[2. See Diemo Schwarz’s article in this issue of eContact!, “Interacting with a Corpus of Sounds.” Related ideas are also reviewed by Casey et al. (2008).] or remixing, but perhaps with an emphasis on larger, more unified blocks of material.
Computational Approaches Based on a Compositional Technique
There have been some recent attempts to use the procedures of serial composition in algorithmic electronic music composition. Techniques referred to as “serial” constitute a particularly rigourous system in which pitch sequences are manipulated algorithmically. In “total serialism”, this principle is extended to sequences of any other musical parameter that are subjected to similar transformations. In fact, in 2013, members of a class at Columbia University wrote code for these serial pitch transformations, which are quite easy to achieve. But what is interesting about serial composition with pitch for our needs is the way that different realizations of the sequential progression (the “horizontal” temporal pitch succession) are combined with vertical integration. In this integration, chords are formed and several strands of the Prime and its derivatives may occur in any possible juxtaposition; note repetitions and transpositions of the series are also important. I have written a live algorithm, the Serial Collaborator, to make multi-part piano music using these principles but with a range of interactivity permitted; for example, controlling the degree of overlap of different series version by varying the note density of chords and their frequency (Dean 2014a). These principles have many interesting future applications for live interactive performance, focusing on pitch, rhythm, timbre, spectral density, spatialization or a virtually infinite choice of other salient musical features. So far, I have used this software in live performance (Audio 1) and to provide material within soundscapes in installation work in collaboration with Keith Armstrong and colleagues. 3[3. See in particular the works Finitude (2011) and Long Time, No See (2013) on Keith Armstrong’s website.]
It seems that often quite considerable benefits from generative code occur when the code is “squeezed” into a new context or use. This is the case with the Serial Collaborator, as it can operate on any note sequence, not solely genuine 12-note series. When presented with melodies that correspond to a specific major scale, for example, the transformations can be made to fit entirely within a related major scale. Thus the ascending C-Major scale, when inverted, becomes the descending Ab-Major scale starting from the third. 4[4. This is a technique well known to composers such as Jacob Obrecht (1458–1505) of the Dutch school of Renaissance music. For various reasons, Obrecht had less influence on subsequent composers than, for example, his contemporary Ockeghem, and his combinatorial and transformational methods were not commonly exploited since.] Such techniques yield contrasting outputs from the Serial Collaborator, especially when probabilistic small-note transpositions are also allowed with the major scale materials, providing inflections somewhat equivalent to passing and grace notes.
Statistical Models of Music or Musical Structure
An alternative approach to generative principles is based on statistical analysis of musical features, at the micro- (event by event), meso- (section by section) or macro- (whole piece) levels. The micro-approach is dominated by information about statistical pitch structures. Bayesian and information theoretic approaches allow the analysis of sequential patterns of pitches, in principle using sequences of any length (although sequences of more than 10 pitches seem to be the most informative). These analyses may then be used generatively. For example, using their IDyoM model, my colleagues Marcus Pearce and Geraint Wiggins have obtained statistical data on various symbolic corpora of Classical tonal music, and used it to predict musical perceptual and performative features, but also to generate chorales and other musical forms, in keeping with earlier styles (cf. Pearce and Wiggins 2006; Wiggins, Pearce and Mullensiefen 2009). These outputs were produced unsupervised, that is, without musical information being provided separately by an expert. OMAX, developed under the auspices of IRCAM, is a large endeavour which allows real-time use of related principles and operates on audio data streams as well as symbolic data such as note description and other MIDI information.
The meso- (sectional) approach is one which can flow, for example, from our own analyses of acoustic intensity profiles in realized music. We found that intensity rises and falls follow a statistically dominant pattern not only on an event-by-event basis, but also in terms of sections based on metrical unit, phrase or even longer time frames. In this pattern, rises are shorter and involve faster changes in acoustic sound pressure levels, than do falls. Given that there are finite limits to feasible (or survivable) acoustic intensity levels, generally the rises and falls are overall balanced, so that the loudness is fairly constant. There are potential explanations in terms of psychoacoustic or evolutionary adaptation phenomena: for example, an increasing intensity with time is more likely a signal of danger in the natural environment than is a decreasing pattern. There are also analogies with movement kinematics, and devices built into, or accessible within, performance parameter control software such as Director Musices. 5[5. Director Musices is available on the Speech, Music and Hearing website (KTH Royal Institute of Technology, Stockholm). For a description of the software, see Friberg et al. (2000).]
I have used this feature of intensity profiles as a structuring device in some live algorithms (see, for example, Dean 2003). My ensemble, austraLYSIS, recently performed a collaborative audio-visual work with Will Luers in which one of the sound components is an algorithmic electroacoustic composition based on such intensity profiles (Fig. 1). As soon as one does this, the issue of temporal scale is raised. If the observed temporal asymmetry in duration of rises and falls is applicable over durations up to 20–30 seconds, then what of longer periods? To obtain the same strength of empirical data on this is difficult, if not impossible, with music, since the precision of an analysis increases with the number of data points obtained. To obtain an acceptable level of precision concerning musical patterns operating on a macro-scale (perhaps 10 minutes), one might need to analyse multiple (say 50) musical works, each of which is at least 16 hours long (so that each piece would provide at least 100 data points for the analysis, i.e. 100 x 10 min > 16h)! This argument assumes that we wish to attack the problem on the basis of patterns within individual pieces, granted that pieces are surely sometimes composed on quite different bases. The problem is of course not the analysis, but the availability of material. An approach might be feasible on the assumption that intensity profiling is common to all pieces, using samples from them that are indifferent to their source in a common pool: but this is assuming more than I would wish. Rather, I would expect that a composer has always to take quite arbitrary decisions in applying any such statistical data about temporal profiles of intensity to their composition and improvisation. Arguably, this is analogous to the “squeezing into a new context” I discussed in the preceding section. But can one find an alternative approach that limits this arbitrariness if so wished? We will return to this issue, in relation to live coding, in the concluding section of the article.
Another “meso-” approach I am pursuing involves time series analysis models of music and its performance. These have generative application. Most, if not all, musical events show what is called “serial correlation”, which means sequential temporal correlation (and does not refer to serial music). For example, a high note is most often followed by another, and not by a very low one; a loud note by a loud note; similarly, movement patterns are necessarily continuous, so that the position of the movement at one instant is the strongest predictor of the (nearby) position at the next. A time series model of a continuous process is a mathematical formulation, taking into account serial correlation, of as well as the impact of the controlling factors (acoustic intensity) on the other continuous processes (note duration, pitch). 6[6. See Bailes and Dean (2012) for details of how the method can be implemented within a context of music performance analysis.] Furthermore, this influence may of course be reciprocal rather than unidirectional. If one builds such a model of an ongoing musical stream, such as the events entering Max/MSP, then that model, expressed in terms of the relevant musical features (be they timbral power spectrum or acoustic intensity), can be used to generate future events. The model can be updated regularly or it can be static, representing an image of the whole continuous process as a single entity.
Currently, Geraint Wiggins and I are developing Max externals that incorporate time series models that can be updated and that act generatively in real time. We have working prototypes that are already useful and plan to integrate this work with that mentioned above on real-time IDyOM implementations. The two approaches are potentially highly complementary, as IDyOM is based on the micro- and sometimes meso-levels, while time series analysis is essentially macro- (though it too can be broken down into successive meso-level sequences).
Sonification
Somewhere between the categories and continua described above lies sonification, as used in compositional contexts. Sonification is primarily the representation of data in sound, with a view to facilitating the recognition and comprehension of meaning and pattern in the data. A wide range of techniques are used, from the quite literal (where, for example, frequency of occurrence of some event might determine the pitch or loudness or the sound representing it) to highly filtered mappings. 7[7. For an excellent recent overview, see The Sonification Handbook (Hermann, Hunt and Neuhoff 2011). David Worral has also written in depth on various techniques and æsthetic approaches (see Worrall et al. 2007; Worrall 2009).] In the context of composition and sonic interaction, sonification merits consideration because gesture or other performative components may be mapped to sonic outputs by the same processes. In addition, music can be made in relation to real-time data streams arriving from any process on (or outside) the planet. I will not pursue the topic further here, but I bring it up for continuity, and note finally that most of the issues of interaction discussed elsewhere in the article are relevant to its use.
Algorithmic Compositional Approaches Based on Cognitive Studies, including Computational Models of Cognition
I want to suggest here that an interesting way forward towards processing musical real-time data streams for generative purposes is provided by the empirical sciences of cognition and by computational modelling of cognition.
Let me give first an example based on a disputed aspect of psychoacoustics: the degree to which we can localize low frequencies in space. The literature on this is in conflict, partly because the two conventionally understood cues to location, inter-aural temporal and intensity differences, do not suggest good theoretical mechanisms for low frequencies. On the other hand, some empirical evidence, especially with listeners in off-centre positions and with subwoofers situated on the floor, is strongly positive: localization seems effective under such conditions. One possible line of investigation, which we are considering, is that timbral differences between sounds arriving at the two ears are consequent on asymmetries in room reflections and absorption. Another possible factor is the “seismic”-like vibrations which can be transmitted from floor speakers, and may have separable vestibular sensor mechanisms (see Hill, Lewis and Hawksford 2012; Todd, Rosengren and Colebatch 2009; also, the discussion in Dean 2014b, 105).
I raise this topic primarily to mention the use of low frequency spatialization in interactive composition and performance — which I find to be underutilized — since most speaker arrays only have one subwoofer, conventionally deemed sufficient. It is also interesting in light of the enthusiasm some composers (notably Canadian Robert Normandeau) have for spatial organization of frequency distribution in their compositions for multi-speaker systems. We can think of this as “spectral distribution”. Normandeau has expressed a preference for elevated subwoofers (Normandeau 2008), and I share his judgement. With a pair of subwoofers, one on the floor and one elevated, many interesting possibilities may exist, and these require further perceptual investigation. 8[8. See Normandeau’s 2009 article in Organised Sound (p. 277) for a description of his technique of “timbre spatialisation”.]
To return to generative algorithms, clearly spatialization is an important aspect of electroacoustic work, and given suitable performing spaces and speaker arrays, algorithmic control of low-frequency distribution is a tempting possibility. It us partially pursued in the creative commercial software Kenaxis, a stimulating and effective interface based on Max/MSP. Kenaxis has an interesting spectral distribution option, developed by its author, Canadian composer-improviser Stefan Smulovitz.
Let us turn next to computational modelling of cognition 9[9. See Wiggins, Pearce and Mullensiefen for a practical introduction to “Computational Modeling of Music Cognition and Musical Creativity” (2009); also Lewandowsky and Farrell’s “Computational Modeling in Cognition: Principles and practice” (2011).] and consider two main kinds of model. Firstly, one kind deals with empirical cognitive data and seeks a parsimonious representation of it, usually in terms of the putative cognitive components. Secondly, another kind uses data about basic cognitive functions, such as the speed or the spiking and transmission pattern of nerve impulses, hence the interaction between sensory and motor systems. 10[10. For a discussion of “action-sound couplings”, see Jensenius 2013, and also his PhD thesis.] Building on neuro-physiological data, large scale models of the cognitive system have been advanced; one such model is encapsulated in ACT-R, a general theory of cognition (Anderson et al. 2004). 11[11. Also consult the ACT-R webpages (Carnegie Mellion University) for more on cognitive architecture.] These system models then can predict the speeds of various response processes, and sometimes their nature and effect.
Given either kind of model, it is not difficult to envisage a path by which it can be translated into musical action, given some assumptions about the nature of the musical substrate: whether pitch- or sound-based, metrical or otherwise, and so on. As yet there are few published accounts of a rigourous musical application of their ideas. A related idea, however, is the use of continuous electro-physiological data from human brains to drive music synthesis, and this approach has been more prevalent (cf. Rosenboom 1976). 12[12. Note the upcoming BCMI 2015 — 1st International Workshop on Brain-Computer Music Interfacing. Also see eContact! 14.2 — Biotechnological Performance Practice / Pratiques de performance biotechnologique (July 2012).] Given this, I suggest there is reason to hope that future work will provide further controlled, systematic application of computational cognitive models to music-making.
Interaction with Live Algorithms and the Opportunities of Live Coding
Earlier in this article I referred to the idea of “squeezing” available musical information into a new context. In their 2009 article, Brown and Sorensen suggested that live coding is a “musical practice that balances the capabilities of the computer and the human” (p. 23) and provide a compelling list of performance aspects that are specific to this form of music-making. Partly because “programming can become intuitive… a means for musical expression and an extension of musical imagination” (Ibid.), I believe that live coding has considerable, though as yet unfulfilled potential. I propose that this potential might be explored, for example, by venturing into more complex programming techniques, as implied in the passage above on cognitive modelling. In live coding, a degree of engagement, even embodiment, is suggested. It is quite possible that this can be different in nature or extent from that with live algorithms. Live coding may exploit and develop implicit understanding and processes — the “intuition” referred to by Brown — whereas to be effective with a live algorithm in musical performance, I believe that the performer or user has to acquire and retain explicit knowledge, at least to some extent, of how it functions.
Nick Collins et al. have discussed further aspects of “Live Coding in Laptop Performance” (2003). Fellow contributor to this issue, Renick Bell exploits the middle ground between code and algorithm, often by opening his performance with a large barrage of pre-formed code, which he then manipulates further live, as in the concert within this symposium. 13[13. See Bell’s article “Considering Interaction in Live Coding through a Pragmatic Aesthetic Theory” in this issue. The audio of his SI13 performance is available on the Sound Islands website.] Future potentials in both fields, live algorithms and live coding, and their interfaces, are appealing.
Conclusion
I have argued that interactive and generative algorithms are a developing component of music-making. The algorithms may be based on mathematical structures applied to music, on statistical analyses of prior or ongoing music, or on cognitive understanding and modelling. I suggest that it is in the last of these areas that some major developments may occur, perhaps bringing together the strands of live algorithm and live coding, of sonification and neurophysiology, of engagement and embodiment, and of acoustic and electronic sound sources.
Acknowledgements
I would like to thank Alex McLean (slub and the University of Leeds, UK) for insightful discussion, and PerMagnus Lindborg for his valuable suggestions.
Bibliography
Anderson, John R., Daniel Bothell, M.D. Byrne, Scott Douglass, Christian Lebiere and Yulin Qin. “An Integrated Theory of the Mind.” Psychological Review 111/4 (October 2004), pp. 1036.
Bailes, Freya and Roger T. Dean. “Comparative Time Series Analysis of Perceptual Responses to Electroacoustic Music.” Music Perception 29/4 (April 2012), pp. 359–375.
Bown, Oliver. “Experiments in Modular Design for the Creative Composition of Live Algorithms.” Computer Music Journal 35/3 (Fall 2011) “Emulative Algorithms and Creative Algorithms,” pp. 73–85.
Brown, Andrew R. and Andrew Sorensen. “Interacting with Generative Music through Live Coding.” Contemporary Music Review 28/1 (February 2009) “Generative Music,” pp. 17–29.
Casey, Michael, Christophe Rhodes and Malcolm Slaney. (2008) " Analysis of Minimum Distances in High-Dimensional Musical Spaces." IEEE Transactions on Audio, Speech and Language Processing 16.5 (May 1980), pp. 1015–1028. Available online at https://engineering.purdue.edu/~malcolm/yahoo/Slaney2008(NearestNeighborsTASLP).pdf [Last accessed September 2014]
Collins, Nick, Alex McLean, Julian Rohrhuber and Adrian Ward. “Live Coding in Laptop Performance.” Organised Sound 8/3 (December 2003), pp. 321-330.
Cope, David. Virtual Music. Computer Synthesis of Musical Style. Cambridge MA: MIT Press, 2001.
_____. “Recombinant Music Composition Algorithm and Method of Using the Same.” Google Patents, 2010.
Dean, Roger T. Hyperimprovisation: Computer Interactive Sound Improvisation. Madison WI: A-R Editions, 2003.
_____. “The Serial Collaborator: A Meta-Pianist for real-time tonal and non-tonal music generation.” Leonardo 47/3 (June 2014), pp. 260–261.
_____. “Low-Frequency Spatialization in Electro-Acoustic Music and Performance: Composition meets perception.” Acoustics Australia 42/2 (August 2014), pp. 102–110.
Dean, Roger T. (Ed.). The Oxford Handbook of Computer Music. New York: Oxford University Press, 2009.
Dean, Roger T. and Freya Bailes. “Time Series Analysis as a Method to Examine Acoustical Influences on Real-Time Perception of Music.” Empirical Musicology Review 5 (2010), pp. 152–175.
Dean, Roger T., Kirk N. Olsen and Freya Bailes. “Is There a ‘Rise-Fall Temporal Archetype’ of Intensity in the Music of Joseph Haydn? The Role of the Performer.” Music Performance Research 6 (2013), pp. 39–67.
Ebcioğlu, Kemal. “An Expert System for Harmonizing Four-Part Chorales.” Computer Music Journal 12/3 (Fall 1988), pp. 43–51.
Friberg, Anders, Vittorio Colombo, Lars Frydén and Johan Sundberg. “Generating Musical Performances with Director Musices.” Computer Music Journal 24/3 (Fall 2000) “Winners of the Bourges Software Competition,” pp. 23–29.
Hermann, Thomas, Andy Hunt and John G. Neuhoff (Eds.). The Sonification Handbook. Berlin: Logos Verlag, 2011.
Hill, Adam J., Simon P. Lewis and Malcolm O.J. Hawksford. “Towards a Generalized Theory of Low-Frequency Sound Source Localization.” Proceedings of the Institute of Acoustics 344 (2012), pp. 138–149.
Jensenius, Alexander Refsum. “An Action-Sound Approach to Teaching Interactive Music.” Organised Sound 18/2 (August 2013) “Best Practices in the Pedagogy of Electroacoustic Music and its Technology),” pp. 178–189.
Lewandowsky, Stephan and Simon Farrell. Computational Modeling in Cognition: Principles and practice. Thousand Oaks CA: Sage Publications, Inc., 2011.
Lewis, George. “The Virtual Discourses of Pamela Z.” Journal of the Society for American Music 1/1 (February 2007), pp. 57–77.
Normandeau, Robert. “Timbre Spatialisation: The medium is the space.” Organised Sound 14/3 (December 2009) “ZKM 20 Years,” pp. 277–285.
Normandeau, Robert. Private correspondence. July 2008.
Nierhaus, Gerhard. Algorithmic Composition: Paradigms of Automated Music Generation. Vienna: Springer, 2009.
Pearce, Marcus T. and G.A. Wiggins. “Expectation in Melody: The Influence of Context and Learning.” Music Perception 23/5 (June 2006), pp. 377–405.
Pressing, Jeff. “Cybernetic Issues in Interactive Performance Systems.” Computer Music Journal 14/1 (Spring 1990) “New Performance Interfaces (1),” pp. 12–25.
Rosenboom, David (Ed.). Biofeedback and the Arts: Results of early experiments. Vancouver: Aesthetic Research Centre of Canada, 1976.
Todd, Neil P.M., Sally M. Rosengren and James G. Colebatch. “A Utricular Origin of Frequency Tuning to Low-Frequency Vibration in the Human Vestibular System?” Neuroscience Letters 451/3 (February 2009), pp. 175–180.
Wiggins, Geraint A., Marcus T. Pearce and Daniel Mullensiefen. “Computational Modeling of Music Cognition and Musical Creativity.” In The Oxford Handbook of Computer Music. Edited by Roger T. Dean. New York: Oxford University Press, 2009.
Worrall, David, Michael Bylstra, Stephen Barrass and Roger T. Dean. “Sonipy: The Design of an Extendable Software Framework for Sonification Research and Auditory Display.” ICAD 2007 — Immersed in Organized Sound. Proceedings of the 13th International Conference on Auditory Display (Montréal: The Schulich School of Music of McGill University, 26–29 June 2007). Available online at http://www.music.mcgill.ca/icad2007 [Last accessed 7 June 2014]
Worrall, David. “An Introduction to Data Sonification.” In The Oxford Handbook of Computer Music. Edited by Roger T. Dean. New York: Oxford University Press, 2009.
Social top