English

Intelligent Real-time Composition

by Arne Eigenfeldt

School for the Contemporary Arts, Simon Fraser University (Vancouver)

Introduction
Codifying Musical Knowledge
Kinetic Engine — One Model for Realtime Composition
Future Directions
Notes | Bibliography | Author Biography

Introduction

Real-time Composition vs. Improvisation

First, let me define what I mean by real-time composition, and how I view it differently from improvisation. It is a personal definition — related to Joel Chadabe’s notion of “interactive composition” — that I arrived at in an effort to justify, to myself, why I am interested in improvisation, without considering myself an improviser.

George Lewis, whose credentials as an improviser and musical philosopher cannot be challenged, strongly asserts that improvisation is not real-time composition (see Lewis 2000). I would also suggest the opposite, that real-time composition is not improvisation. Grove’s Dictionary of Music suggests that improvisation is the creation of a musical work as it is being performed, while Benson further suggests there are many shades of improvisation, ranging from subtle levels of interpretation to free extemporaneous creation (Benson 2003). Grove’s is less concise in its definition of composition. Rather than attempting a universal definition, I will give a more personal one that suits my purpose here: composition being the deliberated ordering of musical materials, occurring prior to the performance. As such, it is possible to retain control over many musical strands, due entirely to the separation of (contemplative) design from presentation. This paradigm has resulted in one of the distinguishing features of Western Art Music: its vertical and horizontal complexity over long periods of time.

It would seem that such complexity cannot be arrived at extemporaneously and collectively, at least not in the creation of organized complexity. Randomness, particularly constrained randomness, has been used in music to generate complexity; however, this produces what Weaver suggests is disorganized complexity that can be understood (as well as generated) by statistical models (Weaver 1948). The former, however, achieves its complexity through the interaction of its parts, and can have the potential to have emergent properties. While collective improvisation can produce such properties, its goals will be simpler by necessity, and the interaction of its parts limited, as its participants can only react to surface features (sound), rather than any developing underlying structures, implicit or otherwise.

Composition, therefore, requires deliberation in order to achieve an organized complexity: this is seemingly impossible in realtime, at least amongst humans. However, through the use of computers, it is possible to create a musical work during performance that has a correlation between elements of the system.

Improvisation within Composition

Personal experience also suggests other similarities, and differences, between improvisation and performance. After teaching composition to university students for over a decade, as well as attempting an auto-ethnographic analysis of my own compositional methods (whether acoustic or electroacoustic), I have come to believe that the initial stage of composition is, in fact, improvisational. This is the point at which intuition plays the greatest part, the point in which “inspiration” appears. Once the general idea has appeared, whether on paper, in a sequencer(1), or “under the hands” in the case of a pianist, the hard work begins: the craftsmanship of composition. At this point, composers are forced to rely upon learned methods, perhaps their own “bag of tricks” — which, arguably, can be considered “style” — to manipulate the material into complex relationships.

Codifying Musical Knowledge

After spending years attempting to use the computer as a compositional partner, both acoustically and electroacoustically, I feel strongly that its use in the initial stage of composition is limited, while at the latter stage it is limitless. These demarcations stem from the computer’s requirement of a clear codification of rules into algorithms. What defines a good musical idea — the initial germ of an idea — is difficult, if not impossible, to codify; at least the rules of compositional craft are more formally developed and explicitly stated.

In listening to student works in progress, composition teachers may attempt not to judge the ideas themselves (although they may point out their limited usefulness), but instead comment upon how these ideas are being used — and by this I mean developed, articulated, manipulated, evolved, etc. (2) We may point out that certain ideas engender certain types of development, while other ideas suggest potential relationships to other ideas. This knowledge is built over time within the composer through experience; eventually, an internal rule set, implicit or explicit, results.

Is it not possible, therefore, to codify such knowledge within software? Is it not possible for the composer to suggest the musical idea in some way, provide the materials to be used, and then allow the software to piece together the material, develop it, combine it, in ways that reflect the codification? And can this be done in performance, thereby combining the spontaneity of improvisation, with the deliberation — and potential for the control over complexity — offered by traditional composition?

Old-School Interaction

Joel Chadabe was one of the first composers to pursue the notion of live composition through what he considered “interactive composition” (Chadabe 1984). This involved “a mutually influential relationship between performer and instrument” — the instrument, in this initial case, being a CEMS analog computer. He defined it as a process in which:

the instrument is programmed to generate unpredictable information to which the performer reacts during performance. At the same time, the performer at least partially controls the instrument. Since the instrument influences the performer while, at the same time, the performer “influences” the instrument, their relationship is mutually influential and, consequently, interactive.

Although this may read as a variation of traditional improvisation in that a performer controls an instrument, what is important to realize is that Chadabe’s instrument was capable of — and indeed produced — multiple musical gestures at the same time. The complexity of the individual layers could not be controlled directly, and Chadabe was forced to “sculpt” the sound at a higher level. Sal Martirano, working with a similar system, suggested the analogy of “driving a bus” (Chadabe 2007).

The notion of unpredictability seemingly modeled the interaction experienced by musicians within an improvisational setting. In the previous decade, Max Mathews had explored the use of the computer to vary the performance of pre-composed musical material, in much the way a musician might subtly vary the details of the music, or what one would consider expressive performance. However, this suggests an attempt at bringing established performance practice into the realm of computer music, rather than attempting to create a new paradigm of musical creation.

Other early practitioners of interactive computers music, such as David Behrman, George Lewis, Martin Bartlett, also explored the potential of the computer as an improvisational tool, and unanimously used random methods to generate the requisite unpredictability. I discuss elsewhere the historical reasons for using randomness in these situations, as well as pointing out its limitations. Returning to Weaver’s concepts of disorganized versus organized complexity, I would suggest that neither improvisation nor composition (Cage and his followers notwithstanding) relies upon randomness as a structural determinant; therefore, in order to fully exploit the potential of the computer as a compositional tool of organized complexity, one needs to move beyond randomness.

Abstracting Music

The first step in codifying musical knowledge is representing it in a format both understandable to humans and computers. Such representations abound in acoustic music due to its abstraction into symbolic representation — musical notation — over a millennia ago. Those elements that were codified early — melody, rhythm, and later harmony — can be individually represented as well having their relationships represented. For example, the triad A C# E can be represented as MIDI note numbers (57, 61, 64), pitch classes (0, 4, 7), frequencies (440, 550, 660 Hz), or in relation to other triads (IV in the key of E major).

MIDI, the standard of digital music that appeared in the 1980s, provided a convenient, if limited, representation of music for interactive composers. As I suggest in the earlier cited paper, the limited timbral possibilities offered by the MIDI synthesizers in those years forced composers interested in real-time interaction to focus upon a complexity of interaction, rather than one of timbre. Indeed, MIDI offers a convenient method of determining exactly what a performer is playing (i.e. note number 60, with a velocity of 88, a duration of 112 ms, and an delay of 230 ms from the previous onset), and an equally convenient response (i.e. play MIDI notes 60, 62, 63 with velocities of 90, durations of 125 ms, and entry delays of 250 ms). In contemporary live electronics, the ability to record a performer’s audio and/or manipulate its acoustic properties in sophisticated ways lessens the need to understand what the performer is actually playing.

This is partly due to the fact that timbre is much more difficult to represent in a simple format. While the potential to differentiate between timbres in real-time is becoming possible through elements of machine-learning, we still lack useful methods of organizing them (although this, as well, is becoming possible). For example, we may classify different recording of wine glasses as “rubbed”, “struck”, “scratched”, or “smashed”: those classifiers may invoke an aural impression of these recordings, but only after actually listening to them (as opposed to MIDI note 60, which, as a representation, has a clear meaning). During the compositional process, we may prefer “rubbed_5” for its unusual “hollow” quality: how can we ascertain a similar sound, or one that is related in some way, without auditioning our materials again? Current research into classifying audio (see ISMIR —International Conference on Music Information Retrieval) will, no doubt, provide methods in the future. But not quite yet.

Elsewhere, I explain one method that I have used to organize timbres based upon spectral analysis. While this initial research was immediately useful, the parameters were admittedly quite limited. More importantly, it also demonstrated the tremendous amount of work still required to formalize timbral relationships.

Setting Limits: What should the software know?

Recognizing the difficulties of defining an all-inclusive system, I decided early on to limit the knowledge I would attempt to encode, and focus upon rhythmic interaction. Furthermore, I would make no attempt at creating a universal system that embodied all rhythmic knowledge; in other words, sacrificing generality for strength. Related to this lack of generality, I would make no attempt to satisfy any æsthetic criteria other than my own, at least for the moment. It would not be a musicological tool, and therefore make no attempt at recreating any given musical type or style; instead, it would be purely compositional, and one that modeled my own compositional æsthetic.

Straddling the line between scientific research and artistic creation, I also decided to dispense with subjective testing of listeners — for example, asking listeners “Is rhythm A more interesting than rhythm B?” Because I am trained as a composer, and thus trained in making those decisions, I feel that the success of the system is purely my own subjectivity (3).

Kinetic Engine — One Model for Realtime Composition

The results of my research, begun in 2005, can be found in technical descriptions elsewhere (see Bibliography) — I will only give a general description here. Version 1 of Kinetic Engine was created with two purposes in mind: firstly, to intelligently and continually vary material without user supervision (hence its name); and secondly, to explore the interrelationship between multiple rhythmic streams so as to create an evolving, yet perceptible, “groove”. Version 2 incorporated multiple “agents” which could loosely be compared to improvising percussionists.

Multi-Agents

Intelligent agents are elements of code (programs or patches) that operate without direct user interaction (they are autonomous), interact with one another (they are social), interact with their environment (they are reactive), and make decisions as to when they should operate, and what they should do (they are proactive) (Wulfhorst 2003). Since these are also attributes required of musicians in improvisational settings, the use of agents to emulate human-performer interaction has proven to be a fertile field of research.

In Kinetic Engine, musical knowledge is contained within the agents themselves — for example, how to create rhythmic patterns, how to vary them, and, most importantly, how to interact with other agents. A great deal of effort was applied to the interaction of agents, so as to create a rhythmic polyphony that, though created artificially, could have been created by a human (this being one definition of artificial intelligence). Although there still remain elements of randomness, these are limited to probability distributions. Examples of the system can be seen and heard here.

Both of these versions of Kinetic Engine produce complex, evolving rhythms that require no user input — both are rule-based, rather than database, systems — and little user interaction. However, since the rule-base is rather small, Kinetic Engine tends to produce homogeneous music over longer periods of time. If left to perform on its own, the music created after one hour would be essentially the same as earlier, although the details would differ. While adherence to the rules ensured interesting music, they also limited unpredictable behaviour.

In both cases, the audio output of the system is either via sample playback, or controlling a twelve-armed percussion robot (see Eigenfeldt and Kapur 2008).

Recomposition through Analysis

I suggested earlier that the use of the computer to generate ideas may be limited due to the difficulty in codifying the essence of what makes a good musical idea. This is further complicated by context, in that certain ideas may prove useful within different styles, while limited in others, or they may acquire importance in relation to other ideas. David Cope has pursued an associative learning strategy that attempts to build a comprehensive database of a variety of materials, including musical, textual, and symbolic items (Cope 2005). This may eventually offer extremely powerful models for artificial musical creativity.

It may be a while before computers are able to create interesting music completely from scratch: thus, my starting point for version 3 of Kinetic Engine is to give the software “raw” ideas from which to derive learned techniques. The techniques are thus very specific; rather than attempting to apply machine learning to build associations of all types of possibilities (as Cope does), I am severely limiting the knowledge required for individual sections. In effect, I am telling the software agent to “play kind of like this” for a given section, immediately forget all those rules, then telling the agent to “now play kind of like this”.

In this sense, the rule-base has become dynamic, generated by the source material itself. Given a set of ur-rhythms in the form of a monophonic MIDI score, the system analyses the music to determine tendencies, such as beat patterns, phrase lengths, cadential material, pattern construction, and relationships between patterns (if patterns are found).

The MIDI part is first parsed by an Analysis agent to determine a subdivision representation for each beat (see figure 1, below).

Figure 1. The first eight represented subdivisions.

Since the music is now represented symbolically, organization will occur on the representation, rather than the data itself, making it that much more flexible.

See figure 2 for an example of how a given measure is represented by subdivisional representations: (11 13 8 0).

Figure 2. Representations for an example measure.

The individual beat representations are grouped into phrases, determined either by rests or repetitions, and these phrases are then analysed for state transitions. In other words, the probability for a certain subdivision to follow (or precede) a previous subdivision. This information will be used in the construction of phrases by the Player agent using first order Markov chains. Note that both forwards and backwards transitions are stored, allowing for the construction of patterns from a phrase beginning (forwards) as well as from its cadence (backwards).

Patterns are rated in terms of their relationship to other patterns in the source material, in order to determine potential variation. For example, if it is determined that there are five discrete patterns, with only subtle differences between them, the system assumes these are potential (subtle) variations of a single pattern.

Figure 3. Example for two patterns considered close variations.

If there are greater differences between patterns, the system assumes a variety of materials, and more “block variations” will result. These determinations are based upon a variety of factors, including similarity of density, subdivisions, and internal pattern repetitions.

Figure 4. Three patterns considered significantly different, and thus unique.

The analyses are saved as XML files, and can be considered to represent a “tendency” for a particular section.

Variation through Genetic Algorithm

The Analysis agent, determining the relevant musical material off-line, does the “heavy-lifting” — an analysis of 32 bars of monophonic music takes approximately thirty seconds. In performance, a Player agent will use the analyses to generate new material. The patterns and the rules do not remain static during performance, however, undergoing continual variation through reanalysis and regeneration: evolution via Genetic Algorithm (Horowitz 1994).

When each Player agent first loads, it generates a limited set (32) of discrete rhythmic patterns based upon the XML analysis file — a population of rhythms. These patterns are analysed for similarities to one another, and for tendencies, using the same algorithms undergone by the source material. The Player agent will chose one of the patterns to begin performing, then, based upon a coefficient derived from the original material, decide when to vary the material, and by how much: in the latter case, another pattern from the population is chosen based upon its similarity relationship to the current pattern.

After a certain amount of time — usually specified by the user in performance — a new population is generated after culling a portion of the initial population. Since this new population is generated based upon the analysis of the initial generation (rather than the source material), an amount of rhythmic evolution will have occurred; furthermore, certain patterns from the original population will have remained — since the entire first generation was not destroyed — thereby ensuring a relationship to the original.

In the example below (figure 5), an initial population is made up of three initial patterns (p1, p2, p3, derived from the analysis data), from which there are a number of variations (children) generated (p1a, p1b, p1c, etc.). After a number of the patterns are culled (randomly at this point), one can see that p3 was completely eliminated, and is therefore out of the gene pool. The second generation contains both original patterns (p1b, p1c, p2b) and variations of those patterns (shown in red). After a random selection, half of these patterns are culled, and variations of the 2nd generation are created. The final result happens to contain some of the first generation (p1b), some of the 2nd generation (p1c_a, p1c_b, p2b_b), as well as children of the 2nd generation (shown in red). Parent patterns are contained within a data-field of the children, so that “age”, as well as predecessors, can be tracked.

Figure 5. Population growth showing three generations.

In GA literature, deciding which data to destroy, and which to allow to reproduce, is considered the fitness test, and one of the most difficult aspects of applying GA techniques to music. If we are attempting to generate “interesting” music, how does the software rate which pattern is more interesting than another? (4) The solution in Kinetic Engine is to assume that all patterns pass the fitness test, since all patterns were generated from source material provided by the composer (although work is underway to use “potential for agent interaction” as a criteria for reproduction).

At any point, the composer can reintroduce elements of the original source material, generate a completely new population based upon new source material, or combine different source materials.

Future Directions

At present, version 3 of Kinetic Engine has its complexity based entirely upon a “horizontal” understanding of rhythm, whereas version 2’s intelligence focused upon a one of “vertical” interaction. Therefore, an Interaction agent is being developed to monitor Player agent data, and provide information to all agents about tendencies and potential for interaction. Agent patterns will be rated for complexity (roughly, the amount of syncopation) and density (how many notes per pattern), and this information will be compared to other agent’s data and used in the selection of initial patterns and variations, as well as informing the Player agent which portions of the population to cull.

Towards Realtime Composition

Kinetic Engine offers one model that moves towards a paradigm of realtime composition, rather than computer improvisation. Genetic algorithms, derived from musical analysis, are used to generate a population of potential rhythms, and musically intelligent methods are used to navigate through this database.

The music generated by Kinetic Engine has proven to be exciting in its volatility, yet sophisticated in its construction and evolution. Rhythms and interactions evolve over time, and exhibit emergent behaviours. In this regard, the system produces organized complexity on the same level as contemplative composition, as opposed to the disorganized complexity of music generated purely by random procedures.

Notes

I would include ProTools and similar digital audio mixing programs here.
I realize that this, in itself, will limit the type of music that I am discussing here. For the moment, I’m fine with this limitation.
Nic Collins has suggested that this is “playing the composer card”.
If the fitness test involves humans making these decisions, we come to the fitness bottleneck, in which the entire process is halted as each element is tested and rated.

Bibliography

Benson, B.E. The Improvisation of Musical Dialogue. Cambridge University Press, 2003.

Chadabe, Joel “Interactive Composing.” Computer Music Journal, 8/1 (1984), p.143.

_________. “Electronic Music: Unsung Revolutions of the 20th Century.” Per Contra 6 (Spring 2007). Online Journal. Available at http://www.percontra.net/6music2.htm.

Cope, David. Computer Models of Musical Creativity. Cambridge MA: MIT Press, 2005.

Horowitz, Damon. “Generating Rhythms with Genetic Algorithms.” Proceedings of the International Computer Music Conference 1994.

Lewis, George. “Too Many Notes: Computers, Complexity and Culture in Voyager.” Leonardo Music Journal 10 (2000), pp. 33–39.

Weaver, Warren. “Science and Complexity.” American Scientist 36:536 (1948).

Wulfhorst, R.D., et al. “A Multi-agent Approach for Musical Interactive Systems.” Proceedings of the International Conference on Autonomous Agents and Multiagent Systems. ACM Press, 2003, pp. 584–91.

eContact!

eC!

Social top