credits

LISTENING AND SPACE

Sean Dockray

Bernard Tschumi, in writings between 1975 and 1991, articulated what he called a disjunction in the field of architecture between the form of buildings and the events that take place within them. This contradiction might be substituted by any from a set of related contradictions (theory/practice, ideal space/ real space, object/ subject), including one that in a certain sense suggests each of the others between the visual and the aural.

Any essay that proposes to discuss sound usually begins by cataloguing the injustices done to listening, as a result of centuries of neglect by vision-inclined philosophers and art historians. Usually, this catalogue slips into a new set of oppositions, reversed so that listening becomes the mode of perception that brings us closer to the truth of existence. Reading this literature, I am reminded of the workers, in the film Brazil, who share a desk that penetrates the wall between their respective offices - they alternately nudge, hold, and yank the desk, such that the gain of one man is at the direct expense of the other. It sometimes seems as if perception is a kind of zero-sum game: territory claimed by the proponents of sound methodically excludes vision, as if hearing and sight exhausted the human sensorium. What follows are a series of oppositions between listening (on the left) and vision (on the right) that come out of these debates:

spherical directional
immersive perspectival
inward outward
interiors surfaces
contact distance
subjectivity objectivity
life atrophy and death
affect intellect
temporal spatial
magic neutral

(Jonathan Sterne, summarizing Walter Ong. in The Audible Past., 15)

The purpose of this essay is to move beyond such facile (and misleading) oppositions by applying three writings on vision or listening that explicitly or implicitly open one up onto the other: Don Ihde's Listening and Voice; 'The Element of Voluminousness': Depth and Place Re-examined by Edward Casey; and Norman Bryson's The Gaze in the Expanded Field. My goal is not to reduce the particularities of seeing and hearing to an indistinguishable sameness, but rather, to use the spirit of the aforementioned writings to discuss several examples that bring the act of listening into focus. The necessary obverse of this inquiry is to also describe what happens to the act of seeing.

One of these examples is my own and it is most likely the reason that I am now writing on this theme. Nearly two years ago, a friend and I decided to begin producing a radio show that would explore radio as a medium for architectural representation. In other words, it would be a radio show about architecture that would not be the in the generic host-guest interview format, but instead would be architectural itself. From the very beginning, we were thrown into the fundamental disjunction that opened this essay. By working with sounds - as opposed to photographs, renderings, drawings, models, or sketches it was practically impossible to represent architecture as purely formal and timeless. Instead, architecture was opened up to the ongoing flow of events, to the uses and abuses of its inhabitants, and to the irreducible presentness of situated experience. Ultimately, I am interested in space, and the other examples that I will address in this essay have been selected as variations of how sound might affect one's experience of space. Because space is a theme that cuts across disciplines, these examples represent a range of interests and address very different audiences.

What exactly does it mean when I write that I selected examples that bring the act of listening into focus? This is an intentional attempt to set this investigation off in a phenomenological direction. In a certain sense, this is exactly what Don Ihde seeks to at the beginning of Listening and Voice, when he transitions from a cursory description of experimental phenomenology, which he illustrates with a diagram representing the structure of vision, to the beginning of a description of The Auditory Dimension. He fixes his gaze on a box of paperclips, such that they are held in the center of his vision, in the focal core. The stationary box is, of course, mute. Suddenly, a common housefly buzzes across his field of vision. For a moment, the self-same object (the fly) exists within both the visual and aural horizon. It is seen and heard. The box of paperclips, on the other hand, stands beyond the horizon of sound (Ihde, 50) even while it stands within the horizon of vision. In the instance of the fly, sound and the moving object overlap with one another. Ihde eventually arrives at two diagrams, side-by-side: one shows the region of sight, bounded by the horizon of invisibility; and the other depicts the region of sound, which is circumscribed by a horizon of silence. The paperclip box is represented in the first diagram, but not the second. The fly, on the other hand, is shown as an arching vector in both diagrams (although it is shaped differently and not located in the same part of the region). A third example, the wind, is shown only in the region of sound presences. The passage culminates with the two regions literally overlapping, like a Venn diagram.

Although Ihde's approach is quite clear and logical, intended to ease the reader into a fuller description of the act of listening by departing from what is known (that is, the structure of the visual field), the idea of bringing listening itself into focus might suggest an integrated field, rather than two discrete but intersecting fields. Ihde himself makes this move in Experimental Phenomenology when he writes that he, rather than attend[ing] solely to vision (56), is able to concentrate... attention upon the visual dimension. The other phenomena do not disappear, their recalcitrant presence remains, but they recede to the fringe of awareness. (56-57) In these introductory texts, however, the other phenomena remain at the fringe of awareness, while Ihde continues to concentrate on the focus (be it visual perception or listening and voice) to such a degree that the focus itself becomes a field. By discussing examples that bring the act of listening into focus, I will try not simply to be mindful that this concentration is not exclusive and does not abolish the full range of global presence (Experimental Phenomenology, 57), but to continue to describe what happens to vision even as it recedes. Although my analysis does not claim to represent the full range of global presence, it does attempt to concentrate on the combination of listening and vision, and posit the structure of an aural-visual field.

My first example is deceptively simple - I have chosen it because it aspires to an experience of pure listening. In Disney-MGM Studios, a theme park adjacent to Walt Disney World, the reality of Hollywood blends into Disney fantasy. There, Sunset Boulevard and Hollywood Boulevard intersect near Echo Lake where, just past a giant dinosaur, one finds Sounds Dangerous at the ABC Sound Studios. This amusement provides visitors with an experience of three-dimensional sound featuring the comedian Drew Carey. Visitors file into a dimly lit theater, finding a pair of headphones on each seat. The attraction begins with a short video that introduces the narrative: Drew Carey plays a hapless undercover detective named Charlie Foster who promptly damages the camera that is hidden in his necktie. At this moment, the projection (which is understood to be the video from Foster's spycam) cuts out and the theater is left in complete darkness and, for most of the remaining ten minutes or so, the audience only has access to the audio from the headphones.

Similar to Lady in the Lake and Dark Passage, Sounds Dangerous uses the subjective camera technique to let the audience see from the point-of-view of the main character. Unlike these two films, however, the Disney attraction employs an analogous 'subjective microphone' so that each audience member will see and hear everything the talent sees and hears, as though you were right there with him when it happens. (Quoted from the Sounds Dangerous script) To push the illusion one step further, Sounds Dangerous creates an elaborate contrivance wherein the action is happening in real-time, at that moment. Before the video, a Production Team Member in the theater instructs the audience about how to position their headphones and promises, We're going to put you in the middle of the action... live. Upon receiving word that the director and Drew Carey are ready, he instructs, Cue the opening, and the director appears on screen to address the audience, once again insisting that this is live television and nobody knows what will happen. Finally, the screen is given over to Detective Foster's spycam.

Of course, excepting the roles played by Disney cast members in the theater, none of the action is live. In only 12 minutes, Charlie Foster infiltrates a snow globe company, escapes a swarm of killer bees, drives across town, gets a haircut, crosses the street to a circus, survives a knife thrower and a urinating elephant, and finally cracks the case of the diamond smuggler. This compression in time is the likely byproduct of a tension between the purpose of Sounds Dangerous - to showcase a variety of sound effects and the demands of competing attractions within the Disney parks, which help produce the normative preference for narrative within them. But perhaps this explanation proceeds too quickly and gets in the way of finding something interesting about sound and darkness.

Reading the online reviews (from epinions.com) of the Sounds Dangerous attraction, I am struck by the ambivalence of opinions: while many visitors found the experience literally intense and entertaining, a substantial number of the reviews called it boring and a waste of time. Most, following the description of the theme park, warned that the attraction should not be visited by those who are afraid of, or bothered by, the dark. Every single one of the reviews made reference to the darkness, calling it complete, pitch black, or total. Generally, the darkness was not commented upon beyond a description, except by those who had a negative experience. One seemed annoyed that when the lights went out, the children in the audience began screaming (perhaps in genuine fear or, as is very likely, as a playful response to an unusual situation). Recalling my own childhood, I probably would have been one of the children that reacted with exhilaration, apprehending a kind of freedom and anonymity in the suddenly transformed environment. Nevertheless, as a self-described mature adult, I can now sympathize with the aggravated reviewer I expect an immersive, aural experience from the attraction and the drastic change in lighting resonates with my sympathetic attitude. The giddy screams of children come to me through the headphones, puncturing the artificial horizon that was constructed by device of the headphones (and reinforced by my willingness to participate in the fantasy). In a sense, the real world intrudes on my intimate experience.

I can begin to define an unusual audible field, introduced by the artifice of the headphones. Earlier, I summarized Don Ihde's diagram of the region of sound presences that was bounded by a horizon of silence, beyond which existed the realm of objects that are silent-for-us. When I put on a pair of headphones to listen to music, for example, this horizon is disrupted, contracting in towards me. Sounds that had previously barely audible suddenly lie outside of this horizon, silent. And as I adjust the volume, making the music louder and then softer, the horizon shrinks and then expands. But isn't there a second horizon, implied by the music I am listening to, which also changes in size with volume level? When the music is very soft, much of it is effectively absent for me and I am still able to hear sounds from the outside world. But if the volume is very loud, I can hear nearly every note (excluding some of the higher and lower frequencies) while the sounds from my immediate surroundings, even a shouting neighbor, fall silent. Because music is typically not a single frequency at a fixed volume - but instead an ongoing rush of sound, rising, falling, and pausing over time the horizon of silence for the outside world is put into a constant flux. Through my visual perception, I can be aware of the sounds I am missing by comparing it to my visual experience; for example, I can reasonably assume that I am not-hearing a voice when I see my neighbors animated mouth. To a certain extent, I am aware of the world beyond my horizon of silence, a horizon that has been actually and momentarily compressed by the expansion of the horizon of silence produced by my headphones and music.

What is the use of insisting on two coexisting horizons, when the experience could be represented by only one? Certainly, the sounds from my headphones could exist in the same region, bounded by the same horizon (that is a singular silence). Although this is true, the model I propose does not represent pure hearing, but instead a combination of the aural and visual. The headphones introduce a kind of intermediary space between the horizon of what-is-heard and the horizon of what-could-be-heard. I can lose myself in the intimate world of my headphones, but beyond the horizon that they produce is a world that I continue to sense as present (if not immediate).

Sounds Dangerous introduces a simple, but powerful, twist on this model that can be approximated by closing one's eyes. The intermediate objects that could-be-heard disappear. As reasonable people, we will not spontaneously forget our surroundings, but as surrounding elements fail to register in our senses, we begin to lose touch with reality. This can be a more or less successful departure, depending on the context. For example, if I am standing on a busy sidewalk, listening to music through my headphones with eyes closed, I remain conscious of (and perhaps nervous about) the people walking past me. In the comfortable, spacious theater, however, I am more relaxed and the intermediate objects slip away more easily. Of course, if audience members were to begin screaming as the lights went out, those objects that could-be-heard do not disappear, but intrude on the intimate experience, reinscribing the outside world in the space of the attraction.

It is worth remarking that this particular structure does not correspond to a possible immediate, intuitive one that is encouraged (not necessarily on purpose) by the Disney-MGM literature. In short, an audience member will alternately make sense of the amusement by seeing and hearing, depending on whether the spycam is operational or not. Here, I am using seeing as a substitute for the normative mode of watching projected entertainment in a theater (that is, both seeing and hearing). The presumption is that when the spycam cuts out and the projection disappears, the visual is lost completely and the audience must rely solely on the show's audio component. Sounds Dangerous becomes something like a multi-stable illusion, where one alternates between one mode or the other, seeing or not seeing. What this intuitive structure fails to take into account is the way in which seeing-darkness cannot be dissociated from the act of listening, especially with respect to the relationship between the dual horizons of silence. Furthermore, seeing-darkness is an intensely personal activity that often produces an assortment of entopic effects, from floating things in the field of vision to moving splotches of color. For some, these are a reminder of one's bodily presence and distract from the illusion of being transported from it. Symptomatically, one reviewer wrote that they felt stuck 'hearing.'

This immediately brings to mind John Cage's notorious piece 4'33, which was a composition of four minutes and 33 seconds of silence, in three movements. Not surprisingly, the audience reaction ranged from bemusement to anger, including (I will project) feelings of being stuck. If I consider how the piece might conform to the model presented earlier: the live music performance differs from the experience of listening to music through headphones; nonetheless, in a cursory way, I can still diagram the experience in terms of two horizons for the sounds from the musical instruments and for the rest of the world as well as a field of effectively silent, but visually apparent objects. Cage's composition equalizes the horizons of silence so that those intermediary objects are drawn into the piece. The visually present, but silent physical movements of the performer at the times determined for the compositional movements of the work reinforce the horizons as equal, yet distinct. Although there is nothing being played, 4'33 is a piece, an identifiable event. As a further analogy to the personal, embodied experience of darkness mentioned above, Cage cited a visit to an anechoic chamber as an influence of his work. There, rather than hearing silence as he expected (because the room was designed to absorb all sounds instead of bouncing them back as echoes), Cage simultaneously heard one low sound and one high sound an engineer informed him that these were his circulatory system and nervous system in operation. Between the noises of the perceiving body and the horizon of silence lies the depth of sound-in-the-world, a place for sounds (appropriating Edward Casey's visual concept in his essay, The Element of Voluminousness, 17). In this sense, the idea of silence is as nonsensical as being without being in a place. Disney-MGM was not attempting anything as radically open as John Cage's work, yet each used a kind of contextual emptiness to produce a singular horizon for each audience member Cage expanded the inner horizon of the performance to the limits of the outer horizon of the world, so their contained sounds perfectly coincide; and Disney pushes the sounds of the outside world to the fringes of perception and beyond, such that the sounds of the attraction become the totality of aural experience.

Another reviewer of Sounds Dangerous reacted negatively to the darkness, not because it instigated responses that undermined its intended effect, but because it gave the person a sense that their time was spent with nothing to show (my italics) and ended up feeling gypped. This feeling that the attraction was an underwhelming waste of time was quite common amongst those who expressed dislike for it. These are, perhaps, predictable responses, particularly considering the cultural preference for that which is visual. But if this is the case, why did so many people like it? Why is this boring, non-visual attraction sometimes called the hidden gem of Disney-MGM Studios?

The most common observation made by the enthusiastic reviewers (as well as some of the unhappy ones) is the sensation that they could literally feel the sounds. One person swore that they could feel warm air from the barber's blow drier; another said, you actually get the ticklish feeling of having the back of your neck shaved; and one described the feeling as more immediate, you can feel the blades snipping away right next to your ears. Regarding the killer bees, reviewers commented that you will actually be swatting at the buzzing bees; that it feels like there are bees buzzing right next to your ears; and you hear the buzzing sound so clearly that it feels as though the bees are right near the back of your neck (this from the same person who felt ticklish at the back of the neck during the barbershop scene). These were by no means unusual or passing remarks. In fact, one person actually titled their review, You can feel the sounds.

Is it possible that the large majority of people who reviewed Sounds Dangerous actually felt something that wasn't there? This would certainly be a powerful experience that would justify support the warning offered by one visitor who said that children might find the realistic audio too intense. If, however, I choose to disregard the practical unanimity of this description and consider it merely colorful language, it is still interesting that very few visitors made the expected claim that they could see the things that weren't there (for an example, one person wrote that the amazing audio effects painted a vivid picture of what Drew was seeing, which doesn't seem to describe immediate experience so much as explain what was supposed to be happening). Alternatively, these observations might be something more than inaccurate descriptive language, but reactions to an actual, common synesthetic experience. Perhaps we should take the reviews at face value and accept that audience members really felt a sensation when stimulated aurally. If this is true, then the portion of the program where water was sprayed on the audience, at the moment that an elephant was heard urinating, is redundant. Curiously, this gag was mentioned in only one review, suggesting that the sounds alone were found more compelling than sound combined with water. Like the screaming children in the audience, the tactile stimulus - rather than augmenting the experience, immersing the visitor deeper in Drew Carey's world undermined the effect, intruding on the intimate experience.

The three-dimensionality and realism of the sounds is a product of the binaural recording method. At a superficial level, binaural is similar to common stereo recording - they each create a sense of spatiality and of surrounding the listener. In the case of studio music, the stereo recordings are usually made with the microphones spread several feet or more apart from one another. As a result, these don't sound correct when one listens through headphones. Although listening to the same recording through speakers better approximates the recording conditions, the acoustics of the room will destroy some of the realism, particularly as sounds from the left channel reach the right ear and vice-versa. Binaural recordings, on the other hand, are typically unencumbered by the latter problem of simple stereo because they are meant to be listened to with headphones. Furthermore, these recordings are done with two microphones placed in the ears of a real or dummy head, such that sounds from the surroundings reach the microphones at slightly different times and with transformations based on the physicality of the head and pinna. The result is a recording in which listeners are able to locate the sounds they hear spatially and based on the reviews of Sounds Dangerous, some sounds take on a real tactility.

Drew Carey wore special microphones in his ears which supposedly were capable of detecting a mere change in pressure in the room - for the audio component to Sounds Dangerous. In the context of the narrative, the audience is told that the undercover detective is wearing a surreptitious camera and microphone. The camera, which is primarily not functional, has an aestheticized low- quality that one might expect for an undercover detective plot. At times throughout the show, the audience is made aware of the hidden camera: when Carey brings a snowglobe close to his body; when he looks at his reflection in the mirror; and especially, when he places the camera in his mouth, damaging it. The hidden microphone, on the other hand, is only referred to in the introduction and is neither shown nor manipulated by Carey, and the sounds produced by this microphone do not draw attention to the mediating technology (as the unnaturalness of the spycam does). Instead, one has a very real, clear aural experience, which would be impossible if Detective Foster really was an undercover spy wearing undercover equipment binaural microphones are quite conspicuous. Nevertheless, this suggests and interesting possibility: if the experience of watching the projected image gives one the dual sense that they are looking through the screen as well as looking at the screen, then is there something about the projected mono-image (which is already associated with a particular miniature device) that might undermine the very immersive sense that the audio gives?

My second example follows obliquely from this question. How does sound figure in immersive facilities like virtual reality rooms? Might these facilities - which can provide active stereo images (for a deeper sense of three dimensions) and a camera free from the material limitations of an optical video camera (including the capability to move responsively in real-time) take Sounds Dangerous out of darkness? It should be noted that sound usually does not figure into virtual reality simulations and where it does, the audio rendering quality is significantly less mature than the graphics rendering. At the UCLA Visualization Portal, the site of my own limited experience with these technologies, most of the models were silent, and for the one that wasn't, the sound seemed to be an afterthought, or an appendage.

This particular model was an approach and walk through the Cathedral of Santiago de Compostela. My initial reactions to the sound were: first, that it was muddy and flat; second, I couldn't get any real sense of directionality; and finally, it seemed somehow detached from the images, as if it wasn't stuck enough. Supposedly, as the camera moved through the digital model, sounds (which had an actual location in the space of the model) were increased or decreased in volume, so that through motion one would be able to discern the location of the sound sources. Perhaps this works better with synthetic sound sources (which I have yet to experience, but may still be subject to parts of the following analysis) than with field recordings deployed in digital space, as in the Santiago de Compostela walkthrough.

As the camera (and by extension, we) moved through the space of the model, we hear the sounds of a place filled with people; yet, there are only a handful of people in the entire model - people who are static and flat. Visually, the structures and ground of the model also seem somehow flat. Although the buildings seem geometrically accurate, occluding one another believably in tandem with our motion, the texture-mapped surfaces seem to be lacking a certain density or substance. Is this a property of the rendering techniques? Might they be advanced to the level that the simulation is as substantial and dense as a filmic image digitally projected? Perhaps. In contemporary popular cinema, computer simulations regularly produce images that are as realistic as images of real space. Then what is particular to this experience of wandering through the fictional space of a digital simulation that contributed to my feelings of flatness and a detachment of what is heard from what is seen?

In spite of my slight disappointment with the immersiveness of my visual experience (part of which might be a symptom of only two-thirds of the screen being available for our demonstration), I perceived more depth in the images than in the sounds. And at the same time, however, the sounds were recorded in real places, with their unique acoustics and reverberations. In a certain way, they were fuller and more real, if not as clear, as the visual model. But two adequately immersive simulations one produced visually by a digital model, and the other aurally by field recordings processed by the same model failed to add up to something even more immersive. Rather than giving the audience two conduits for perceiving the same space, the simulation offered two different spaces the space of the forms (comprised of surfaces and light) and the space of events (made of movement, action, use, noise, and the sonic play of materials) one static and timeless, the other animated and timeful.

The intentional viewpoint (camera and, by extension, microphone) of the simulation is programmed to integrate these two spaces into a unified, whole experience. With every move, a new visual perspective is calculated in relation to the forms, light, and shadows. Similarly, a sonic perspective, based on the Cartesian position, is used to adjust amplitudes and panning on a database of sounds. Nevertheless, there isn't a sense that the sounds interact with the surfaces, that the visual model produces those reverberations in the audio. Furthermore, a rotation of the viewpoint to the left, doesn't create as strong of a change in the field of sound presences (and a corresponding reorientation of the horizon of silence) as one would expect, leaving one feeling out of the world of the sounds, rather than immersed in it. Perhaps the model itself must be enhanced for simulating sound: assigning acoustic materiality to surfaces; modeling the life of a place (its people, the wind, the vegetation, animals and machines). And maybe the viewpoint, if it is to be the opening for an immersive experience, must be modeled as a body rather than simply a point.

All of these are ambitious, expensive suggestions. But, are there other ways that one can represent (or present) the life of a space? Is it a matter of pushing the technological limits of sensorial stimulation to the point where a machine, operating on the body, can deliver a fully developed, totally immersive simulation? My interest is in how sound gives one a sense of space (and of place). The previous examples that I've given have dealt with advanced recording (and synthesis) techniques in order to immerse the listener, by virtue of the realism of the sensorial stimulations. However, with Building Sound, the radio show that I help produce, we broadcast in mono at very low quality. No one has ever complained (nor expressed delight) that they could feel the sounds from one of our shows. Rather than try to reproduce the actual experience of hearing, we try and use the meaning of sounds to stimulate the imagination. I imagine that Gaston Bachelard would be sympathetic to our cause: he suggested in an essay entitled Radio and Reverie that radio would be ideally suited for exploring archetypes like the home. Maybe he would appreciate the possibilities of reminiscing in the dark clarity of Sounds Dangerous; or perhaps he would prefer the warmth and familiarity of the degraded signal. In either case, sound plunges us into lived space, where the distinction between inside and outside, subject and object, form and event, disintegrates. Through listening, we might learn how to experience the world that Keiji Nishitani wants us to see: a mobile continuum that cannot be cut anywhere (Norman Bryson, The Gaze in the Expanded Field, 97) in which each object opens out omnidirectionally (100) such that the ground of its being is the existence of everything else. (98)