English

Interacting with Inner and Outer Sonic Complexity

From microsound to soundscape composition

by Barry Truax

It is possible to think of the two extremes of the world of sound as the inner domain of microsound (less than 50 ms) where frequency and time are interdependent, and the external world of sonic complexity, namely the soundscape. In terms of sonic design, the computer is increasingly providing tools for dealing with each of these domains through such practices as granular synthesis and multi-channel soundscape composition.

Microsound

One of the most striking developments over the past few decades has been to push the frontiers of models of sound and music to the micro level; it has become fairly common to refer to this as “microsound” (Roads 2001). At this level, concepts of frequency and time are conjoined by a quantum relationship, with an uncertainty principle relating them that is precisely analogous to the more famous uncertainty principle of quantum physics. Dennis Gabor articulated this quantum principle of sound in 1947 in his critique of the “timeless” Fourier theorem.

Gabor illustrated the quantum as a rectangular area in the time and frequency domain, such that when the duration of a sound is shortened, its spectrum in the frequency domain is enlarged. In other words, a sine tone whose duration is less than 50 ms becomes increasingly a broadband click as its duration becomes shorter. Conversely, to narrow the uncertainty in frequency analysis, a longer “window” of time is required, both in analysis and synthesis. The auditory system balances its frequency and temporal resolution in a manner that is consistent with the perception of linguistic phonemes, where the simultaneous recognition of both spectral and temporal shapes plays a crucial role in rapid identification of speech. The analogy to the Heisenberg uncertainty principle of quantum mechanics is not metaphorical but exact, because just as velocity is the rate of change of position (hence the accuracy of determination of one is linked to a lack of accuracy in the other), so frequency can be thought of as the rate of change of temporal phase.

Time-frequency models — a class of sound synthesis and signal processing methods that have emerged over the last two decades — have their basis at this quantum level, such that changes in a signal’s time domain result in spectral alterations and vice versa. The best known of these methods is called granular synthesis, and the granulation of sampled sound that produce their results by the generation of high densities of acoustical quanta called grains. These grains are composed of enveloped waveforms, usually less than 50 ms (meaning a repetition rate of more than 20 Hz), such that a sequence of grains fuses into a continuous sound, just as the perception of pitch emerges with pulses repeating at rates above 20 Hz. So-called “Gabor grains” have the frequency of the waveform independent of the grain duration, whereas “wavelets” maintain an inverse relation between frequency and duration, and hence are useful in analysis and re-synthesis models (Roads 1996).

However, several other established synthesis methods are now regarded as time-frequency models, for instance the VOSIM and FOF models, both originally designed for speech simulation. Each is based on an enveloped, repeating waveform — a sine squared pulse with a DC component in the case of VOSIM, and an asymmetrically enveloped sine wave in the case of FOF. Moreover, it is the time domain parameters involved in each model that control the bandwidth of the result, usually intended to shape the formant regions of the simulated vowels. Michael Clarke (1996) realized the relationship of the FOF method to granular synthesis early on, and has proposed a hybrid version called FOG. In his work, a fused formant-based sound can disintegrate into a rhythmic pattern or granular texture and then revert to the original sound, even maintaining phase coherence in the process.

In my own work, the granular concept has informed most of my processing of sampled sound, the most striking application being to stretch the sound in time without necessarily changing its pitch (Truax 1990, 1992, 1994). It is a revealing paradox that by linking time and frequency at the micro level one can manipulate them independently at the macro level. In fact, all of the current methods for stretching sound are based on some form of windowing operation, usually with overlapping envelopes whose shape and frequency of repetition are controllable. The perceptual effect of time stretching is also very suggestive. As the temporal shape of a sound becomes elongated, whether by a small percentage or a very large amount, one’s attention shifts towards the spectral components of the sound — either discrete frequency components, harmonics or inharmonics, or resonant regions and broadband textures. I often refer to this process as listening “inside” the sound, and typically link the pitches that emerge from the spectrum with those used by live performers in my own mixed compositions. 1[1. See, for example: Dominion (1991), for chamber ensemble and two digital soundtracks; Powers of Two: The Artist (1995), electroacoustic music theatre for two singers, dancer, video and eight digital soundtracks; or the recent spectrally based work, From the Unseen World (2012), for piano and six digital soundtracks.] In other cases, the expanded resonances of even a simple speech or environmental sound suggest a magnification of its imagery and associations, as in my work Basilica, where the stretched bell resonances suggest entering the large volume of the church itself.

Convolution: Linking the time and frequency domains

To return to the microsound domain, we can note that another fundamental principle linking the time and frequency domains is illustrated by the technique of convolution. Convolution has been a standard topic in engineering and computing science for some time, but only since the early 1990s has it been widely available to computer music composers, thanks largely to the theoretical descriptions by Curtis Roads (1996) and the SoundHack software of Tom Erbe that made this technique accessible.

Convolving two waveforms in the time domain means that you are multiplying their spectra (i.e. frequency content) in the frequency domain. By multiplying the spectra we mean that any frequency that is strong in both signals will be very strong in the convolved signal, and conversely any frequency that is weak in either input signal will be weak in the output signal. In practice, a relatively simple application of convolution is where we have the impulse response of a space. This is obtained by recording a short burst of a broadband signal as it is processed by the reverberant characteristics of the space. When we convolve any dry signal with that impulse response, the result is that the sound appears to have been recorded in that space. In other words, it has been processed by the frequency response of the space similar to how that process would work in the actual space. In fact, convolution in this example is simply a mathematical description of what happens when any sound is coloured by the acoustic space within which it occurs, which is in fact true of all sounds in all spaces except an anechoic chamber. The convolved sound will also appear to be at the same distance as in the original recording of the impulse. If we convolve a sound twice with the same impulse response, its apparent distance will be twice as far away. For instance, my work Temple (2002) is based on processing singing voices with the impulse response of a cathedral in Italy, that of San Bartolomeo in Busetto, which is available on Peak’s Impulseverb processor. Given the intrusiveness of such an action as firing a gun, the more acceptable approach was to break a balloon and record the response of the space.

In fact, one can convolve any sound with another, not just an impulse response. In that case, we are filtering the first sound through the spectrum of the second, such that any frequencies the two sounds have in common will be emphasized. A particular case is where we convolve the sound with itself, called auto-convolution, thereby guaranteeing maximum correlation between the two sources. In this case, prominent frequencies will be exaggerated and frequencies with little energy will be attenuated. However, the output duration of a convolved signal is the sum of the durations of the two inputs. With reverberation we expect the reverberated signal to be longer than the original, but this extension and the resultant smearing of attack transients also occurs when we convolve a sound with itself, or with another sound. Transients are smoothed out, and the overall sound is lengthened (by a factor of two in the case of auto-convolution). When we convolve this stretched version with the impulse response of a space, the result appears to be half way between the original and the reverberant version, a ghostly version of the sound, so to speak. In Temple, both the original sound and the version convolved with itself are convolved with the impulse response of the cathedral and synchronized to begin together, thereby producing a trailing after-image within the reverberant sound. 2[2. Sound examples of works mentioned here, as well as information on specific works, compositional techniques and other related publications can be found on the author’s website. CDs on which works mentioned here are published are available from the Cambridge Street Records catalogue page.]

Another approach, which I used in Chalice Well (2009) was to convolve different types of textured sounds — those with a particulate quality, mainly water drops, splashes, streams and trickles, but also glass breaking, bubbles, percussive locks and doors, transposed male consonants and differing densities of granular synthesis textures (as used in Riverrun) — as both sources and impulses. Normally when one convolves continuous sounds together, the result is a smeared and quite blurred texture, but because these sounds were comprised of numerous impulses, the results of the convolution were highly detailed and well defined. Moreover, the spatial qualities of the original sound files ranged from dry to reverberant, and so each combination produced a well-defined sense of space: dry convolved with dry producing a foreground texture, reverberant with reverberant a distant, background texture, and dry with reverberant a middle ground location for the sound. Each combination (approximately 200 were used) could be described as a hybrid sound, situated somewhere between synthetic and processed, and incorporating elements of each of its parents. Families of sound textures were created and documented, since each possible permutation of sources resulted in a different but related output.

While experimenting with this technique, I quickly began to sense the type of imaginary soundscape the results were suggesting, which was an underground cavern in which many types of water flows were present. One class of splashing and dripping water had in fact been recorded in a resonant well (and had already been used in the third section of Island [2000]), and these sounds made me think of the aura surrounding wells and caverns in general, and that of Chalice Well in Glastonbury in particular. The narrative idea emerged that a soundscape composition could simulate a journey down the well into the legendary and highly symbolic caverns (which have never been proven to exist). Although the convolved sounds were bright and distinct, their hybridity blurred the edges of the more realistic source sounds and supported the illusion that one was in an imaginary space.

The spatial deployment of the tracks was achieved with four sets of eight tracks each, that is, four stereo pairs spread evenly around the listener. Because of the familial nature of each set of eight tracks, and their individual spatial qualities in terms of apparent distance, it was relatively easy to create an illusion of a coherent space of great depth as well as presence. The 32 tracks were mixed to eight in a circular format, though when performed in halls with speakers on various vertical levels (such as the premiere at the Sonic Arts Research Centre in Belfast in March 2009), it proved very effective to double the eight tracks onto more than one height of loudspeakers, supporting the illusion of being in an underground cavern.

Soundscape Composition

The soundscape composition, with the interdisciplinary conceptual background of soundscape studies and acoustic communication, and the technical means of granular time-stretching (Truax 1988, 1990, 1994b) and multi-channel diffusion (Truax 1998), all of which have been developed at Simon Fraser University over the past 40 years, provides a well developed model for the musical use of environmental sound. Elsewhere I have described the ideal balance that should be achieved in such work as matching the inner complexity of the sonic organization to the outer complexity of relationships in the real world, without one being subordinate to the other (Truax 1992b, 1994a). In fact, these inner and outer sources of complexity mirror listeners’ everyday interpretation of sounds through their abilities to recognize and interpret sonic patterns at different time scales, and link them to contextual knowledge gained from their experience in the real world. The soundscape composer’s role in this regard may be thought of as enhancing the inner workings of sounds to reflect their contextual meanings, instead of leaving them as abstract entities related only to each other (Truax 2008, 2013).

I have also suggested (Truax 1996, 2002) that the characteristic principles of the soundscape composition as derived from its evolved practice are as follows:

Listener recognisability of the source material is maintained, even if it subsequently undergoes transformation;
The listener’s knowledge of the environmental and psychological context of the soundscape material is invoked and encouraged to complete the network of meanings ascribed to the music;
The composer’s knowledge of the environmental and psychological context of the soundscape material is allowed to influence the shape of the composition at every level, and ultimately the composition is inseparable from some or all of those aspects of reality; and ideally:
The work enhances our understanding of the world and its influence carries over into everyday perceptual habits.

Thus, the real goal of the soundscape composition is the re-integration of the listener with the environment in a balanced ecological relationship.

Given these intentions, the kinds of digital processing and computer-controlled diffusion techniques that we have available today form a powerful toolbox for the composer to create mimetic and abstracted environmental soundscapes — aided by immersive multi-channel diffusion — that range from the representational through to completely virtual soundscapes such as found in Chalice Well, described above. The distinction implied by the term “abstracted” is that the composer’s sound design through processing somehow brings out the internal aspects of the sounds being used, and enhances them, rather than obliterating the identity of the source material and/or imposing unrelated processing artifacts onto those sounds. The distinction is often subtle, as similar processing techniques may be used for both abstract and abstracted results. Sometimes it is a matter of the degree of processing, and in others (with soundscape composition) the relationship of the processed sound to the original may not be clear to the listener, nor does it need to be. However, even a “hidden” relationship may bring out some deeper or more intuitive connection to the real world. For instance, the processed versions of the waves and river in the first two sections of Island deliberately sound unrealistic compared to the highly realistic, close-miked recordings of the original sounds. The wave sounds are low-pass and high-pass filtered with a great deal of feedback on the resonator being used, such that the low-pass version sounds as a drone, but the source material going into the resonator still retains the rhythmic regularity of the original waves. To me, the pulsing drone heard in counterpoint with the actual waves suggests to the listener that we are visiting a foreign or even magical island open to our imagination. Likewise, the stretched and resonated river sounds create additional formants that resemble choral singing, as if imaginary Siren voices are enticing us to continue our journey up the river. At other points in the piece, resonators and stretched sounds merely enhance the original material and are often blended seamlessly with them.

Auto-convolution as described above provides a complementary type of processing that doubles the duration of the sound and emphasizes its prominent frequencies, while attenuating weak ones. It often resembles a lingering reverberant decay, as in Temple. With my recent works Aeolian Voices and Earth and Steel (both from 2013), multiple iterations of the auto-convolved sounds not only extend the source sounds (a car passing by or a percussive sound activated by the wind whistling through a shed in the former piece, or the sounds of steel shipbuilding in the latter) but when used without the original sound suggest, to me at least, a blurred and extended memory of those sounds. This effect is particularly poignant in Earth and Steel because of the historical context of the piece’s attempted re-creation of a past era of shipbuilding, which was intensified at the premiere (2013) in an actual building at the site of the Royal Naval Dockyards in Kent, UK, which is now a museum site.

Over the past 25 years of my working with microsound or time-frequency methods, I have mainly relied on these three techniques (granular time stretching, resonators and convolution) because they each bring out the inner qualities of the environmental and studio-recorded vocal or instrumental sounds I have been using. I sometimes refer to the nature of these processes as allowing me to compose through sound, not just with it. In other words, form emerges from the inner structure of the sound material, guided in its elaboration by contextual knowledge.

Conclusion

The long history of science serving art is clearly continuing into the present era, with one of the most intriguing pathways involving the quantum level of microsound, or what might be called “the final frontier” of acoustic and musical research. Art in the service of science has fewer, but still significant, contributions, such as the artistic visualization and musical sonification of databases. The computer is central to both types of process. However, as in any close relationship, each of the partners may be changed by the encounter. In fact, we could analyse artist-machine experience along the lines of whether the technology plays a servant role in merely assisting in the production process (what I refer to as computer-realized composition), or whether it participates in a manner that changes the artistic vocabulary, process and ultimate result. This latter type of process ranges from the interactive partnership that might be called computer-assisted composition, through to a fully automated, rule-based computer-composed type of work. I have suggested here that when the computer is used to control complexity that results in emergent forms, the role of the artist is profoundly changed to many possible scenarios: guide, experimenter, designer, visionary and poet, to name but a few. In my own work, I have experienced elements of all of these roles, but what sums them all up is my role of relating the inner complexity of the micro domain sound world, to the outer complexity of the real world in all of its natural, human and social dimensions. It is a journey that I find particularly inspiring.

Bibliography

Clarke, Michael. “Composing at the Intersection of Time and Frequency.” Organised Sound 1/2 (August 1996) “Time Domain,” pp. 107–117.

Gabor, D. “Acoustical Quanta and the Theory of Hearing.” Nature 159 (May 1947), pp. 591–594.

Roads, Curtis. The Computer Music Tutorial. Cambridge MA: MIT Press, 1996.

_____. Microsound. Cambridge MA: MIT Press, 2001.

Truax, B. “Real-Time Granular Synthesis with a Digital Signal Processor.” Computer Music Journal 12/2 (Summer 1988), pp. 14–26.

_____. “Composing with Real-Time Granular Sound.” Perspectives of New Music 28/2 (Summer 1990), pp. 120–134.

_____. "Composing with Time-Shifted Environmental Sound.” Leonardo Music Journal 2 (December 1992) “Building Bridges,” pp. 37–40.

_____. “Electroacoustic Music and the Soundscape: The inner and outer world.” Companion to Contemporary Musical Thought Vol. 1. Edited John Paynter, Tim Howell, Richard Orton and Peter Seymour. London: Routledge, 1992, pp. 374–398.

_____. “The Inner and Outer Complexity of Music.” Perspectives of New Music 32/1 (Winter 1994), pp. 176–193.

_____. “Discovering Inner Complexity: Time-shifting and transposition with a real-time granulation technique.” Computer Music Journal 18/2 (Summer 1994) “Composition and Performance in the 1990s (1),” pp. 38–48, 1994 (sound sheet examples in 18/1).

_____. “Soundscape, Acoustic Communication and Environmental Sound Composition.” Contemporary Music Review 15/1 & 2 (1996) “A Poetry of Reality; Composing with Recorded Sound,” pp. 49–65.

_____. “Composition and Diffusion: Space in sound in space.” Organised Sound 3/2 (August 1998) “Sound in Space,” pp. 141–146.

_____. “Techniques and Genres of Soundscape Composition as Developed at Simon Fraser University.” Organised Sound 7/1 (April 2002) “Circumscribed Journeys through Soundscape Composition,” pp. 5–14.

_____. “Soundscape Composition as Global Music: Electroacoustic music as soundscape.” Organised Sound 13/2 (August 2008) “Global Local,” pp. 103–109.

_____. “From Epistemology to Creativity: A Personal view.” Journal of Sonic Studies 4 (2013).

eContact!

eC!