A Phenomenological Time-Based Approach to Videomusic Composition
Return to previous section (Introduction; Chapter 1: The History and Discourse of Visual Music)
Chapter 2: Movement and Gesture in Videomusic
2.1 The Perception Of Sound And Image
The most thorough study available on the relationship between sound and image stems from a seminal book by Michel Chion examining audio-visual interaction in film. According to Chion, the principle of added value is in place when an audio-visual work is interpreted. He defines this principle as:
… the expressive and informative value with which a sound enriches a given image so as to create the definite impression, in the immediate or remembered experience one has of it, that this information or expression “naturally” comes from what is seen, and is already contained in the image itself. (Chion 1994, 5)
He claims that added value is one of the foremost factors in the audio-visual contract — the binding sensorial relationship between sound and image as expressed in film. Chion goes on to divide the sensual spheres of influence within this contract, beginning by carefully observing the separate natures unique to aural and visual stimulation and then their reported properties.
For Chion, the differences between visual and auditory perception are grounded in an understanding not only of temporal and spatial dimensions but also of our perception of movement within them: “For one thing, each kind of perception bears a fundamentally different relationship to motion and stasis, since sound, contrary to sight, presupposes movement from the outset” (Chion 1994, 9). Sound is completely rooted in time and cannot be uncoupled from the experience of duration. There is no such thing as a snapshot of sound. Even sounds artificially created to be static, such as an unmoving fixed pitch, are given their resolution within the domain of time and need this domain in order to be experienced. Further, such sounds do not appear in the natural world. The ones that do often contain vectorized aural demarcations that indicate directionality such as transients and reverberation tails. A still image, on the other hand, exists entirely outside of the domain of time except possibly through cultural references embodied within the image’s content. In the case of moving image, temporal information can only be conveyed by animations that are vectorized according to either cultural signification or by distinguishable physical properties such as gravity or inertia that have their own temporal dynamic (Ibid., 10).
To consider the absent temporal vectorization of certain images, consider the filmed image of an old man sitting on his front porch in a rocking chair. If the only movement in the image is that of his chair rocking back and forth it is impossible to tell in what direction he is rocking from any still frame. Not only is this the case, but also once all of the stills are ordered to comprise the entire movement, reordering them in reverse will give a result just as easily accepted as natural by the eye. If the rocking chair were instead to be falling from the sky, gravity would immediately show the backwards sequence as suspect of some kind of temporal manipulation. The same image of an old man in a rocking chair accompanied by the sound of the chair rocking back and forth is not so easily disoriented. The sound of a chair rocking backwards is quite different then the sound of a chair rocking forwards played backwards. This is especially true if there is any echo or reverberation on the sound.
Chion notes that the speed of perception is different between the senses and so movement cannot be uniformly traced between them. In particular, the auditory sense is much faster than the visual one at analyzing, processing, and synthesizing, and is thus much more proficient at extracting the contours of movement. In fact, Chion states that: “as the trace of a movement or a trajectory, sound … has its own temporal dynamic” (Ibid., 10). He also notes: “the eye perceives more slowly because it has more to do all at once” (Ibid., 11). This can be demonstrated by watching a quick movement of an actor’s hand on film. The end points of the actor’s gesture are easily extracted but the path taken by the hand is not well-defined, even though it is perceptible to the eye.
Undoubtedly, the visual weight associated to the endpoints of the actor’s gesture is due to their relative stillness and thus their perceptibility. By comparison, a sound trajectory made at the same speed is continuously perceptible. However, the impression of continuity given by the ear is a false one. Current research shows that the ear listens, evaluates, and remembers in very short windows of time of about two to three seconds long as a sound evolves (Ibid., 12). These windows are perceived at once as a connected whole very shortly following sound events, not simultaneously to them. These facts demonstrate a substantial difference between the way that movement is experienced and perceived between the senses. In the visual domain movement is not experienced uniformly over time. In the aural one movement is established through time-based comparison.
Image and sound are able to inform each other when they are experienced together. Chion posits three aspects of temporalization by which sound establishes the perception of time in an image. The first, temporal animation of the image, involves the scale by which time is represented. The second, temporal linearization, imposes a sense of succession on montage. The third, vectorization, creates a feeling of imminence and expectation (Ibid., 13) which can be used to generate elements of composition or narrative. These factors are dependent on the natures of the sound and image being presented together and rely on several aspects including: whether the image has any temporal animation itself, the nature of the sound characteristics, the sound’s rate of change, how predicable the sound is, its tempo, and its frequency definition (Ibid., 15). In regards to visual movement, sound adds a stabilizing effect to its perception.
Although sound certainly contains spatial information, Chion explains the role of image in the audio-visual contract as creating a spatial magnetization for sound (Ibid., 70). An audience has the tendency to attribute movement and even spatiality (implied by elements of the visual image) to the sounds accompanying them when tightly coupled. When a sound is married with a perceived source on the screen, an audience will mentally localize the sound as if it emanated directly from the perceived source even though this source is nothing but projected light. This principle is used extensively in modern soundtrack sculpting and is why monaural soundtracks function. Chion does note that this illusion is broken by various degrees in the case where a sound contains excess panning and spatialization. From this we can include that in the audiovisual contract the spatial localization of movement is dominated by the visual sense but not lacking in the aural one.
Chion makes a strong statement about the way the senses inform each other in the audio-visual contract. He argues that the aural and visual senses actually act quite differently perceptually and that it is their influence upon each other that is responsible for the clarity of the modern cinematic experience as they lend “each other their respective properties by contamination and projection” (Ibid., 9). This kind of contamination and projection has a name: trans-sensoriality. According to Chion:
Everything spatial in a film in terms of image, as well as sound, is ultimately encoded into a visual impression, and everything that is temporal, including elements reaching us via the eye, registers as an auditory impression. (Ibid., 127)
This division of sensory information into the spatial and the temporal takes no notice of which sense actively perceived the underlying information when it was attributed.
Chion effectively suggests that the audiovisual contract blurs the roles of the senses, or perhaps more specifically that the roles of the senses are already blurred and that the audiovisual contract tightens their perceived functionality. This argument implies a great divide between the way in which the senses gather information and the ways in which this information is classified by mental processes afterwards. Chion explains:
Trans-sensoriality has nothing to do with what one might call inter-sensoriality … In the trans-sensorial, or even meta-sensorial model … there is no sensory given that is demarcated and isolated from the outset. Rather, the senses are channels, highways more than territories or domains. (Ibid., 137)
This is particularly significant when it comes to understanding the way that movement is perceived between tightly coupled sound and image. Chion believes that the experience of movement in time-based media may be constructed through this process rather than explicitly manifested with raw sensory information:
When kinetic sensations organized into art are transmitted into a single sensory channel, through this single channel they can translate all these signals at once. The silent cinema on one hand, and concrete music on the other, clearly illustrates this idea. (Ibid.)
Trans-sensoriality is fundamental to the understanding of gesture in videomusic because it explains how a movement can be attributed to both the aural and visual senses without necessarily having its spatial and temporal aspects accurately represented by each media.
Consider a gesture in which a sound and abstract shape seem to move together upwards along a screen. For this gesture to be successful, no physical change need be made in the placement of the speakers from which the sound emanates, nor do the frequencies of the sound need to shift with the precise psychoacoustic mechanisms that would allow our hearing to spatialize the sound as coming from above. Instead, some other shift in the sound might indicate a transformation while the shape it is associated with spatially moves upwards on the screen. The fact that the sound’s spatialization does not reflect the same reality as that of the image does not ruin the gesture. Instead, the sound is unquestioningly felt to rise spatially with the image. Though this example demonstrates the trans-sensorial nature of aural and visual perception in such a situation, it does not explain how these elements can be perceived as bound to each other. Sound and image elements must already be perceptually linked for this principle to be in effect.
To discuss events that take place simultaneously within the aural and visual domains Chion invokes the notion of synchresis, “the forging of an immediate and necessary relationship between something one sees and something one hears at the same time (from synchronism and synthesis),” and notes that: “the psychological phenomenon of synchresis is what makes dubbing and much other post-production sound mixing possible” (Ibid., 224). Synchresis is best demonstrated by paying close attention to animated films in which sounds that are synchronized to perceived events, such as the footsteps of a lead character heard as the character walks across a room, become tied to the events. Clearly there is no actual sound being made by a drawing whose movements are depicted through the motion of ink over successive frames. Still, this does not occur to an audience watching such a film. The footsteps become the natural sound of the character walking by virtue of their timing, associated as they are with the movements of the character. This effect, though not as obvious, is still at play when sounds are added in post-production to a filmed sequence. Whether or not the sound tied to a visual event actually stems from that event, temporal synchronization allows a sound to perceptually marry an event in such a way that they are perceived as a unit of action.
Synchresis is used in many ways in narrative film. It can be used to linearize events and to inscribe events into real time (Ibid., 14), such as in a situation where a rapid montage of several events is shown over the sound of a heartbeat and so can be understood to be happening at almost the same time. Synchresis can also draw attention to specific events by using sound to focus perception on one exact moment, such as when a gunshot is heard with the firing of a gun or with a body falling backwards. Synchresis can also be used to tie together movements or events in sound and image which might otherwise not be perceptually linked, or whose contours might otherwise not be perceptually visible. For example, the sound of a gunshot as a pair of eyes close to suggest the act of suicide or, to restate an earlier example, a sound which accompanies a quick hand gesture can help delineate its path. Although these uses are easily exemplified in terms of traditional representational film narrative, they are not limited to that domain and in fact apply equally to abstract image and sound relationships. This can be easily witnessed by watching sound and image experiments such as Norman McLaren’s, where sound painted on the film’s soundtrack behaves in a synchretic relationship with the animated dots or splashes of colour painted on the film’s cells, seeming to belong to the abstract shapes.
In videomusic, synchresis is the major mechanism by which aural and visual events are perceptually linked, however, it is the trans-sensorial perception of their kinetic properties that allow this union to take place. This begs the question of what information is transmitted through kinetic animation. Chion’s observations on the nature of sound and image in film unveil their shared properties as well as their differences. Both sound and image in film are time-based media and their common temporal structuring when used synchretically reinforces the experience of their movement and duration. The tight perceptual blurring that happens between the senses in such a situation is a result of the way that these two properties are mutually reinforced and of the way that the human brain interprets multiple streams of information. Movement and duration are in a sense the most important elements of any time-based composition. In narrative film these principles are not the first discussed, but in music they are paramount. When time-based media are stripped of their concrete referrals, texture, movement and duration are all that are left and it is movement and duration that voices texture over time.
Before continuing, it is relevant to ask the question of how and when film theory can be used to discuss video and in particular videomusic. The difference between film and video is not simply one of analogue and digital medium as might be assumed. First, film, which is usually classified as an analogue medium, is constructed through the sampling of a moving image at 24 frames per second. When film is projected, it is reconstructed by the eye into a continuous image. The integration of discrete samples to form a perceivable whole is one of the main criteria usually associated with digital media (Manovich 2001). Second, both film and video are analogue in perception (Massumi 2002, 138). Culturally, film and video have separate histories which makes it so that they rarely represent the same content, however, the main perceptual differences between film and video stem from the ways in which the moving image they depict is sampled and reconstructed, and the conditions under which it is presented. These discrepancies are collapsing under the modern development of shared paradigms for editing and presentation, as well as through cross-medium æsthetic discourse.
Chion specifically addresses the format of the modern music video as one in which the video is “fully liberated from the linearity normally imposed by sound” (Chion 1994, 167). He argues that time in the average music video is not dramatic, and that visually music videos alternate only between a few different scenes. The main reason that music videos function is because of the relationship between sound and image that they generate, but this is done only through occasional points of synchresis in which an oasis of temporal connection is generated. Otherwise, the sound and image tend to float according to their own internal logics, meeting only at the next connected place. Videomusic experiments with synchresis at a far deeper level than pop culture’s music video practice. Contemporary work demonstrates a continuous pull between instances of shared temporal structure when sound and image are animated or marked as a unit, and separate ones when they are not. Most important, these shared temporal structures are often maintained and developed as compositional entities within themselves.
In music composition, a gesture is a discrete piece of musical dialogue that indicates intentionality (Popek 2007). In order to do this a gesture must be able to be easily separated from its surrounding context, which implies having a distinct contour. It is the shape or change of an easily definable property over time that differentiates an object from a similar stream. (2) Abstracting from this definition, a compositional gesture can be said to be a movement or animation of a property whose contour is well-defined and which indicates some kind of intentionality with regards to its context. Within the domain of sound the most obvious contours are established by changes in volume, especially when a sound stops or starts. Contours can also be traced from movements in pitch or from any other easily extractable perceptual property, including melodic phrasing. To discuss shared compositional structures over combined image and sound material, it is necessary to explore the ways in which properties between media can be paired. A single gesture is made manifest between such properties when their animations share the same contour.
The field of Gestalt psychology offers a concise explanation of why a shared gesture can exist through the Law of Common Fate, which explains how the mind unifies perception in movement. Objects that change at the same speed over time become linked together mentally through observation and don’t require separate attention. In fact, they are categorized perceptually as a single object (Pedroza 2007). This principle is predominant in videomusic and explains why shared movement between sound and image necessitate a single dialog. In a sense, a parallel can be drawn with the compositional tension that occurs between consonance and dissonance in traditional tonal music. In videomusic such a tension is derived from the integration of unified and non-unified aural and visual movement. The compositional frame of such a work does not belong solely to either of the media it is constructed from, but instead is a product of all of the gestures that can be culled from its components. Its structure is reinforced in the gestures that co-exist between both media. Movements in the visual and aural fields that share a contour can inherit properties from each other through the principles of “added value” and trans-sensoriality. This allows such movements the perceptual abilities of both senses and as such they are better defined in both time and space and can be easily extracted from their multiple domains.
2.2 The Experience Of Videomusic
Noë wrote extensively about perception of movement in the visual field. He argues that the blind spot in the human visual field is ample evidence that human beings do not actually need to carry around an internal map of their surroundings but instead recreate it on demand as they interact with the world around them. He points out that human vision is never still but instead is continuously active and centres on the perception of movement, both in external objects and in the way that such objects change in the visual field of the observer as the observer moves. According to Noë, perceivers understand the way that their motor-sensory mechanisms are paired with the external world and so these two types of movement produce sensory change that reveals to them the layout of their surroundings (Noë 2004, 66).
Through a thorough examining of the physical characteristics of the visual field, Noë demonstrates that there is a large difference between what an observer sees and understands as the experience of seeing. The visual field is a construction developed through relational movement and the promise that details can be summoned when needed by the eyes. Vision does not reveal the world in the kind of clarity that it is experienced. Instead, vision traces movement and contour through movement. After describing the way that a closed-eye observer would discern the shape of a bottle through touch, Noë remarks that:
Vision is touch-like. Like touch, vision is active. You perceive the scene not all at once, in a flash. You move your eyes around the scene the way you move your hands around the bottle. As in touch, the content of visual experience is not given all at once. We gain content by looking around as we gain tactile content by moving our hands. You enact your perceptual content, through the activity of skillful looking. (Noë 2004, 73)
The implications of this are extensive when it comes to understanding the relationship between perception and movement.
Movement delineates the observable world. Analogous arguments to those made by Noë about the act of looking can be made for that of listening, which is also an active, touch-like process, wherein movement resolves temporally instead of spatially. For music, as for video, it is movement that is responsible for guiding the audience’s attention through a composition. In the case of videomusic, this movement might occur aurally, visually or, most profoundly, as a perceptual whole that manifests through paired properties within both senses and their changing relation to each other. It is this third type of movement that dominates the experience of videomusic.
The perception of movement happens at both a qualitative and quantitative scale. Relational physical processes that can be understood only qualitatively exist in what is externally perceived to be a well-ordered, quantitative, rationally structured space. In his essay “Strange Horizon,” Brian Massumi examines the relationship between proprioception and vision. He describes a situation where he mistook one street outside the window of his office for another over an extended period. At the time, he had believed that his senses were well-positioned and accurate compared to any external measure of direction. While trying to understand his delusion, he considers his own experience of orientation and movement and determines that both proprioception and vision play important roles in his self-navigation.
When Massumi tries to re-create the experience of orientation and locomotion in his mind he discovers that his memories of any specific journey are mostly of the physical movements he must use to undertake it, with visual indices playing an important corrective role. He concludes that self-navigation is not a matter of moving a body through a series of indexed points in Cartesian space, but instead results from an internal awareness of comparative movement, which is corrected and guided through the use of landmarks and other visual diagnostics. From this, Massumi postulates that navigation through proprioception:
has significant implications for our understanding of space because it inverts the relation of position to movement. Movement is no longer indexed to position. Rather, position emerges from movement, from a relation of movement to itself. (Massumi 2002, 180)
If indeed this is the case with human understanding of space, movement occupies a privileged place in the compositional sphere. In the case of videomusic, the relationship between different sensory channels enhances the perception of movement such that fixed points of reference appear only as a kind of relative indicator of direction and process.
Massumi’s deconstruction of the experience of navigation implies that we reside in a relational space where events and objects owe more to their changing relations to each other than to any fixed external unit of measurement. Because each of these relations is continually moving, and so is vectorized, he extends his findings to cover the whole of these transformations by introducing the field of topology. In this way he is able to talk about movement separate from any static form:
The distinction that is most relevant here is between topological transformation and static geometric figure: between the process of arriving at a form through continuous deformation and the determinate form arrived at when the process stops. (Massumi 2004, 184)
When considered through this lens, all fixed locations are only snapshots of a larger hyperfigure that contains all such possible snapshots at once. These snapshots do not exist as separate from the figure by which they are defined. Instead, they represent only possible instantiations created through arrested movement. Movement can no longer be said to refer to a translation between static coordinate points. Instead, vectors that move through points instead of between them define it. The topological figure is the domain of all such vectors and is comprised by their sum.
Practically, this dialog implies that our senses treat fixed information as a way of gauging movement rather than the other way around. For videomusic, this does a great deal to explain why many sensory mappings are equally as engaging between sound and image. The history of visual music experimentation is filled with examples of one visual property being linked convincingly with an aural one within one piece, only to be linked convincingly in an opposite way within another. If the changing values of these properties only gauge the relational movement between them, than it is no surprise that either mapping can be successful. The gesture exists not in the mapping itself but in the relational distance of its transformation as defined by its context; in the case of videomusic this context is time. Therefore, the gestures in videomusic that achieve different affects are those that manifest different relational movements over the same temporal structure, not ones in which the same movement is voiced through different values. The contributing factors towards this relational movement through time are rate of change, duration, speed, and continuity. These factors above all define the perception of gesture in videomusic.
In a sense, a parallel can be traced between Noë’s research and Massumi’s. Where Noë establishes that the visual field is a whole constructed by relational movement (both from perceived objects and that of ocular origin) and then interpreted as a fixed, rational space, Massumi does the same for orientation. In both cases, relational movement is seen to shape the perception of the metric space it inhabits. In videomusic, sound and image are joined through their dynamic temporal structuring. Time is their shared metric. Here we have found videomusic’s central experience. The same factors that define the perception of gesture in videomusic shape the perception of time under such gesture.
In fact, movement is not restricted to only the visual and aural dimensions of a videomusic piece but is also implicit in their interaction. Massumi believes that the relationship between proprioception and vision is indicative of the way cross-modal perception behaves in general. In the act of navigation, these senses are linked together through a synesthetic cooperation that joins their respective dimensions “to each other, always locally — specifically, where we are lost” (Massumi 2004, 182). The way that these senses reference each other adds another dimension to their experience that Massumi refers to as a hinge-dimension. He develops this idea further by stating, “Where we go to find ourselves when we are lost is where the senses fold into and out of each. We always find ourselves in this fold in experience” (Ibid.). The implication of this statement is that the interaction between multiple senses is continuous but not homogenous, it contains movement and mediates the experience attributed to those senses through that movement. Such a hinge-dimension is at work in videomusic where the synchretic relationship between sound and image is continuously evolving. In videomusic, both visual and aural senses are active and informing each other, however, the experience itself resides in this folding in between. The perceived compositional movement of a piece is not the sum of its separate sensory developments or even their combination but instead sits within this intersection, encompassing their hinge-dimension.
Interestingly, in clinical synesthesia, the mental condition in which those afflicted experience reproducible perceptual experiences in other senses besides those being stimulated, the experience of movement is associated with this sensorial cross-referencing. Synesthetes who are able to control their perceptions sufficiently to use them as a memory-aid describe the experience in terms of their own orientation and navigation through their perceptions. According to Massumi’s summary of this research: “Synesthetic forms are used by being summoned into present perception then recombined with an experience of movement” (Ibid., 186). This psychological relationship between movement and cross-sensory referencing is brought to the forefront in videomusic, where it is the movement between visual and auditory phenomena that defines the experience of the genre.
By focusing on the relational dimension between sound and image, videomusic achieves compositional direction in a significantly different way than either time-based medium alone. The tight temporal coupling of sound and image in this form offers a perceptual experience in which information derived from the aural and ocular senses are combined and inform each other. Not only is movement experienced as a result of the visual and aural information contained within this whole, but it can also be experienced within the changing relational structure between the senses. Because movement is the relational reference for the experience of time in videomusic, this extra dimension from which to derive it offers a highly fluid temporal experience.
Continue to next section (Chapter 3: Analysis of the hands of the dancer; Conclusion; Notes; Glossary; References and Citations; Acknowledgements; Author Biography)
Social top