I am Sitting on a Fence

Negotiating sound and image in audiovisual composition

Tremendous effort has been exerted since the inception of musique concrète to stave off the influences and workings of the visual and to privilege the purely aural in electroacoustic music. In moving from its birthplace of the radio to the conventions of the concert space, electroacoustic music has cultivated a somewhat ambivalent relationship with the visual domain. The integration of common technical procedures in the tools and operations of composing with sound and image, the maturing practice of live performance of sound and image, and the evolving concept of audacity redefine artistic practice and cultural engagement with the intertwined relationship of the aural and the visual. Language, function, radio and space are considered in the negotiation of the perceptual, technological, linguistic, historical and social issues characteristic of the practice of contemporary audiovisual composition.


I begin with a confession: as a composer, as a worker in sound, I have begun to work with images and video. How did this happen? Is this decision a naïve response to the question “Do you have any video work to send us?” so commonly asked of composers today? Is it a defensive reaction to the maelstrom of audiovisual stimuli that one is bathed in daily or is this a sincere artistic need motivated by a historical confluence of skills, technologies and media enculturation? Does the practice of electroacoustic composition have something unique to contribute to the integration of image and sound? Having worked for decades with the sight and sound of live musicians and the sound and often-inferred images of electroacoustic music, “I am sitting on a fence” as to how to rectify the two sides of the perceptual coin so to speak, how to penetrate the region “in between” sight and hearing.

Tremendous effort has been exerted since the inception of musique concrète over 60 years ago to avoid, repress, or at the very least carefully negotiate the influences and workings of the visual and to establish and maintain a purely sonic practice of perception, conception and reception for the art form we now call electroacoustic music. In fact, electroacoustic music, in the broadest sense, has always cultivated a vital, if somewhat ambivalent, relationship with the visual domain, from the use of visual metaphors to denote and clarify concepts and attributes, to the impetus of the visual in programmatic objectives and compositional narrative; from the visual tenets of concert presentation amid a staunchly anti-visual “acousmatic” rhetoric, to the physical surrogacy of time and space navigated via the visual interface and iconography of the computer.

At EMS 2005, Diego Garro stated, “composers of Electroacoustic Music worldwide seem to be drawn almost instinctively to a new, exciting media which combines electroacoustic sounds with the moving image” (Garro 2005). Hunt, Kirk, Orton and Morrison stated ten years ago that it was urgent to “begin at least a preliminary debate on the organisational framework for the combination of [image and sound]” and asked the question, “What does it mean to compose an audiovisual work?” (Hunt et al 1998, 199).

Tools that make common the tasks and procedures of the capture, modification, composition and presentation of sound and image would now appear to be in place and key to the current fluency of audiovisual exploration and production. Yet, beyond the fact that readily available technologies facilitate exploration and the procedures of creating with sound and image, several other factors must be considered in order to fully apprehend the present inclination towards the audiovisual, especially by those whose point of departure is the sonic.


Amid the audiovisual conditioning of the “cinematic experience” of television, mobile communications, the Internet and immersive gaming, “cinema pour l’oreille”, “i-son”, “videomusique”, “visual music”, “objet audiovisuelle” and DJ/VJ practices are drawn into an environment of hybridity and interdisciplinarity as a norm of artistic practice. In the audiovisual discipline, there is a “blurring of the boundaries between levels of production” (Jordan 2007, 2) that invites close inspection of the “connection of musical patterning to visual patterning” (Kendall 2005, 143). The view that music is a sonic art alone may no longer be tenable. Richard Leppert, in The Social Discipline of Listening, reminds us that “listening is not properly understood as a biological phenomenon, rather a historico-sociocultural one: the listener is framed by history, society and culture” (Leppert 2004, 27). Douglas Kahn observes that “the history of the arts using auditive technologies, including those in concert with vision, constitute a large, rarely acknowledged portion of the history of the media arts” (Kahn 1999, 15). Denis Smalley has asserted that “musicians, because of their interactive relationship with sounds and sound-structures, tend to regard the vision-field as extrinsic to music” but that “music, and electroacoustic music in particular, is not a purely auditory art but a more integrated audio-visual art” (Smalley 1995, 90). And finally, Simon Shaw-Miller contends that “notions of media purity in modernism are the historical exception. The conception of fluid boundaries between the sonoric and the visual … is a closer reflection of artistic practice throughout history, than the seeking out and patrolling of borders on the basis of time, space and media” (Shaw-Miller 2002, x).

Video 1 (0:53). Excerpt of Laurie Radford’s Filling (2007 / 15:00), videomusic.

Vision and hearing have been harnessed by science and industry to provide new tools for evaluation and control and have been rigorously employed by various forms of social engineering and governance. In the current nomadic urban culture, one finds the “phenomenon of [an] æsthetization of the visual environment through music” where the walking or motoring listener “is particularly sensitive to the connection between eye and… ear,” what Jean-Paul Thibaud calls the “visiophonic knot”, “the convergence point between the audible and the visible” (Thibaud 2004, 337). Current theories of intertextuality highlight this multi-sensorial environment “in which images, sounds, and spatial delineations are read on to and through one another, lending ever-accruing layers of meanings and of subjective responses” (Rogoff 1998, 14).

The cross-disciplinary mandates of most major media research and production centres, and the burgeoning crops of new programmes in sonic art offering audiovisual training — now apparently considered a fundamental skill for composers — herald a recognition of new modes of perception and production in the audiovisual domain to which sonic art practitioners and consumers of sound and image now increasingly subscribe.

Following the introduction of the electronic recording and manipulation of light and sound, there were numerous concerted attempts to discover and form direct relationships between sound and image by visual artists, filmmakers and musicians. Edmonds and Pauletto speak of an “audiovisual discourse” emerging from the “search for unifying principles that could explain and summarise our multi-sensory experience of the world… a history of scientific discoveries, evolution of technology, perception studies and artistic outcomes” (Edmonds and Pauletto 2004, 115) from Pythagoras through nineteenth-century colour organs to the audiovisual instrument par excellence, the computer. Greirson has proposed the idea of “an audiovisual meta-discipline, that combines aspects of [music and] film composition, experimental visual art and digital media, based on studies in perception, and Michel Chion’s theoretical approach to cinema sound” (Grierson 2005, 2).

In this context, the past century of “acousmatic” loudspeaker listening may appear like an anomaly with which we continue to grapple, a forced bifurcation of sound and image, the artifice of a “music alone” at a time when cultural trends seek increasingly to promote the opposite: a mediated fusion of sound and image. As we celebrate 60 years of a distinctly and purposefully engineered “listening art” and a concerted effort to develop and promote multiple purposes for the practice of listening, it may be beneficial to reconsider the relationship of this sonic art to its sister sense, the visual, poised both threateningly and enticingly on the other side of the fence. Although there are many elements to consider and manners of entry into this investigation, we will limit ourselves today to four — those of language, changes to the functions of vision and hearing, radio and space.


We have no word, at least in English, for the simultaneous act of seeing and hearing, except to slide into the vague territories of experience and perception. Nor do we have a word for an individual or group that both hears and sees an artistic event (“audience” and “viewer” having obvious uni-modal origins). One often speaks of photographic memory; one rarely speaks of “sonographic” memory, despite the fact that most people can instantly recognize the origins and signification of hundreds if not thousands of natural and man-made sounds. Language, residing in speech and the written word, is an arbiter of seeing and hearing. We negotiate seeing and hearing and states of inter-modal perception through the agency of language.

For example, the “sonic image” and the “sound idea” are guiding concepts in sound art practice. Barthes reminds us that the word “image” is etymologically linked to imitari, to imitate, and by extension mimesis, which Emmerson (1986, 19) clarifies as denoting “the imitation not only of nature but also of aspects of human culture not usually associated with musical material.” The word “idea” has its origins in the Greek verb “to see” and is frequently linked to ancient optics and theories of perception (Manovich 1993, 48). Our “imagination” has, at its root, the concept of the visual and the imagined and the construction and conception of an “image” is a precondition for the production and apprehension of “meaning.” John Young, in a discussion of the “sound image” in 2008 asked, “If one cannot imagine a sound, how can one meaningfully perceive [a] sound ‘image’?” Another example is the current language employed to describe and discuss the concepts of space and distance, so important in most electroacoustic music. The stereo “image” as well as the concepts of quadraphonic, octaphonic and 5.1 are rooted in geometry and physical coordinates discovered and confirmed through visual confirmation and delineation.

Is there a new terminology, a new language emerging from the synthesis of the codes and practice of audiovisual composition? Nicholas Mirzoeff, in an introduction to the relatively new research area of visual studies (1998, 5), contends that the spoken word and its manifestation in the written text, although until recently the privileged form of intellectual practice, has given way to a “visual literacy [that] may not be fully explicable in the model of textuality.” A final query regarding the role of language in this context: why do we have iPods (read: eyePod) for listening and not earPods!?


Anne Friedberg, citing Jean-Louis Comolli’s Machines of the Visible, highlights the “frenzy of the visible”, the “social multiplication of images” and the turning of “visualized experience into commodity forms” in a discussion of visual culture at the end of the nineteenth century (Friedberg 1998, 253). Entering into the same discussion, Rogoff speaks of the “heavy burden of post-Enlightenment scientific and philosophical discourses regarding the centrality of vision for an empirical determination of the world as perceivable” (Rogoff 1998, 21). Yet, media theorist Lev Manovich, in his dissertation The Engineering of Vision from Constructivism to Computers, contends that “vision is not a timeless concept; rather, each period understands vision differently depending on how it is used” (Manovich 1993, 1). “Philosophical concepts and arguments have long linked vision with reason and knowledge” and “vision and its mediation (in art) [are] objectified very early on through the canvas and frame, the wall and gallery, then the photograph, and finally given temporal motion and significance through film and animation” (Ibid., 36–37). Hearing only achieves this objectification very gradually during the 20th century with telecommunications, radio and the sound recording as object.

Manovich asserts that “new techniques of visual representation have risen to fill what can be called the cognitive needs of modernization” (Ibid., 72) with vision employed as a code, as a means of logical reasoning, as a way to capture spatial information and process information. He offers the example of how the physical demands and entrained gestures of late 19th-century industrial society were replaced by a modern skill set that emphasized visual tracking, monitoring and selection, and by extension, ushered in the automation of vision to record and visualize geometric and topographic information. Witness the current use of CCTV — the camera as all-seeing eye — as a widely applied “solution” to security and control.

The result of a convergence of the languages and technologies of cinema and computing over the course of the 20th century is the emergence of a distinctively new language, driven and delineated by computing logic, database functionality and an evolving human-computer interface: a language of new media. Aden Evens, in discussing the relationship of music and machines, notes that “literacy these days means speaking the language of new media, which is in many ways the language of cinema” (Evens 2005, 22).

After a century of loudspeaker and schizophonic listening (Schafer 1977, 88) of new uses for aurality such as territorial signalization, spatial mapping via radar and the evaluative roles of data sonification, has there been a parallel evolution and fundamental change to the functions of listening? Just as a historicising of vision rejects “as a natural or a cultural constant” (Manovich 1993, 6) the function of seeing, a historicising of hearing may reveal fundamental changes to the act of listening in the electronic age, its uses moulded by the development of rigorous, function-driven listening strategies and needs. There has been an explosion of terminology and concepts to describe and consider the aural experience, from consciously categorized and æsthetically promoted modes of listening (Schaeffer 1966) and listening relationships (Smalley 1995), to the ever more rarefied skills of technological listening and spatial acuity.

The totality of these skills and the belief that “listening” is a multifaceted and profound mode of sensing, knowing and engaging with the world indicates radical changes to what is understood as and demanded of the act of listening. The recent merger of sound and image in audiovisual composition is perhaps the site of a consolidation of these radical changes to the functions of both viewing and listening. As such, understanding these changes and how they inform the creation and perception of audiovisual composition is a challenge of paramount importance for the discipline.


It is perhaps the radio, affording the wireless voyage of a disembodied voice, that best represents the origins of acousmatic listening well over a century ago. The earliest ambitions of radio were for an artistic expression released from the bounds of space, of Rudolph Arnheim’s enthusiastic entreaty in 1936 for “an entirely unexplored form of expression in pure sound” (Arnhem cited in Kahn 1999, 4). Although quickly constrained and harnessed by governmental regulation, this fledgling art was given new breath and breadth with the phonographic experiments of Schaeffer and Henry and the rise, at least in Europe, of the radio as laboratory for an art of sound. Radio drama, the Club d’Essai, and the first broadcasts of Symphonie pour un homme seul heralded the triumphant return of an ageless aurality, an escape from the confines of the body, the crushing weight of terrestrial location, and the crippling accuracy of the visual. Yet, almost immediately this art form was torn from its newfound home and flung upon the concert stage, where its incorporeal state baffled, and continues to baffle the viewing listener.

Did electroacoustic music in its infancy abandon its birthplace and the roots of a genuine aural art by aligning itself, not with radio, but with the conventions of the Western concert canon? Shackled by the visual and social conventions of the Western concert tradition, amid a post-war climate where that very tradition was being vigorously renewed, electroacoustic music abandoned this initial source of “pure sound”, of radiophonic concepts that one can only speculate may have helped to distinguish its initial modus operandi and objectives from the overwhelming forces of the visual. The techniques and language of film were concurrently appropriated into this new sound art of the 1950s, a seamless procedural invasion that is repeated today in the appropriation of techniques and concepts of cinematic language and the data-driven audiovisual world of the Internet and the computer into audio design and sound art composition.

Douglas Kahn identifies an historical problem inherent in considering radiophonic and sonic art: “despite the cultural pervasiveness of sound, there was no artistic practice outside music identified primarily with audacity” (Kahn 1992, 2). The dearth of documentation and historicising of listening presents a theoretical obstacle in a consideration of the recent convergence of the visual and the aural in audiovisual composition, and requires efforts in sound art research similar to the archaeology of vision presently underway.


A final point to consider in the negotiation of sound and image in audiovisual composition is space, as concept and perceptual construct, as compositional parameter, as an experience that potentially binds the senses of seeing and hearing. Space, perspective, geometry, distance and volume have long been areas of study and consideration for the visual arts, both as concrete issues of practice and as poetic metaphors for concept and theoretical elaboration. The concept of space has only recently been adopted by music following the uprising of timbre and spectrum against the tyranny of pitch and rhythm. Electroacoustic music studies have contributed a conceptual foundation and a wealth of terminology dealing with space and its role in listening and new sonic art practices. Space, along with spectral control, has been eagerly adopted as a privileged terrain of practice and theory in electroacoustic music and the particular practices of spatial and spectral design and listening have become the distinguishing characteristics of the art.

Video 2 (0:30). Excerpt of Laurie Radford’s Filling (2007 / 15:00), videomusic.

We turn to Manovich again and his consideration of Erwin Panofsky’s Perspective as a Symbolic Form from 1924–25 wherein “a parallel between the history of spatial representation and the evolution of abstract thought” is established. Panofsky proposes that spatial representation “moves from the space of individual objects in antiquity to the representation of space as continuous and systematic in modernity… from an ‘aggregate’ space to ‘systematic’ space” (Manovich 1993, 102). Manovich’s investigation is extended via the work of cognitive linguist George Lakoff who asserts that “all semantic concepts and abstract human reasoning are based on metaphorical mapping of spatial concepts… [and that these] spatial metaphors are at the core of the semantics of human language” (Lakoff cited in Manovich 1993, 44–45). If, as we have noted earlier, language is an arbiter of seeing and hearing, and, if spatial metaphors are the foundation of language, might space as both conceptual field and pragmatic component serve as a unifying and motivating force for audiovisual work? Notwithstanding the dissonance in spatial concept between sound and image in standard audiovisual presentation today, with a regimented surround-sound audio projection acting like a spatial prosthesis for a severely bounded, two-dimensional visual frame, perhaps the present fascination with and earnest research about space in the sound arts can serve as a catalysing mechanism with which to bind sound and image together in the discipline of audiovisual composition.


The classic view of stimuli captured by the “unknowing” organs of sight and sound that feed data to the “higher mental faculties” for decoding, reasoning, understanding and responding is giving way to a more holistic, ecological, embodied view of knowing and reason, wherein the roles of the eyes and ears are those of active partnerships with the physio-neurological and brain systems (Manovich 1993, 44–45). Jerry Holsopple, drawing upon Gregg Ulmer’s work on electronic languages and literacy, outlines a social history that has evolved from oral to literate to an electrate era in which “the processing and intellection of information and the construction of meaning is no longer via the spoken or written word, but is predominantly through the congruence of image and sound” (Holsopple 2003, 16).

Although we will undoubtedly continue to be influenced by inherent neurobiological and culturally paradigmatic relationships between ear and eye, if we concede that our modes of listening have evolved and been refined to distinguish and appreciate the detail and language of new genres of sonic art, in parallel with the enormous changes demanded of viewing and spectatorship after a century of intense sound and image production in popular culture, we may now be able to “get off the fence” and begin in earnest to negotiate a path towards an inter-modal art of audiovisual composition.


