This accessible page is a derivative of https://www.researchcatalogue.net/view/1742136/1743774 which it is meant to support and not replace.

Audio description when the original page is opened: A human voice giggles weirdly.


The Elegy: Creating the Lament

Reading and listening/hearing are functions that demand a measure of patience, presence of mind and cognitive attention, whereas the ocular/visible is capable of drawing attention by affect, thriving on the non-linear and non-teleological gradient of desire.

(Shang, 2016: 244)

Audio description: An example of a default audio journey where the user has no Instagram account. Captions by Willoh Weiland and Gabby Bush. Click on https://www.researchcatalogue.net/view/1742136/1743774#tool-1744577 to hear the example.

As a medium that demands a measure of patience, presence of mind, and cognitive attention, our audio journey for Scrape Elegy had to grapple with a number of key issues. First, it is a representation of a visitor’s sense of self (Choi et al. 2020), a sonic narration or staging of their life, yet we do not know at the design process who their social media self is, let alone their content, style, or the length of their captions, which would affect the duration of audio samples generated. Second, regardless of the individual profile of each visitor, we wanted to use the sound design to reinforce the affect of the physical toilet, with its sense of absurdity and exaggerated banality, with the banality questioning and ultimately fracturing itself. And, finally, we wanted to create a dramaturgical and emotional narrative, bringing in the techniques of film scoring to create an intimate emotional journey for the visitor that would support the sense of nostalgia or ‘cringe’ when hearing their own captions.

 

A critical part of the sound design was the neural voice used for reading out visitors’ captions. Wanting a gender-neutral voice with a certain deadpan ennui, we explored the possibility of training our own custom neural voice, using providers such as Resemble.ai, Overdub by Descript or Amazon's Polly. However, owing to cost considerations (custom AIs are expensive), as well as being denied access owing to our inability to detail what the voice would be saying [3], while also remaining mindful of contributing to the coffers of Big Tech, we settled on using a free pre-trained voice by Microsoft Azure Cognitive Services in two different emotional modes – ‘whispering’ and ‘unfriendly’.

 

The pre-trained female voice was pitched down 20%. We adjusted the rate of speaking, the volume, the emotional mode, and the gaps between sentences to achieve variation in the vocal journey, which were pre-programmed to occur at certain points in time. Different durations for each section and various parameters were trialled during the design process to tweak the emotional narrative.

Image description: A screenshot of part of the code used in the work to set the ‘neural voice’ parameters. Click on https://www.researchcatalogue.net/view/1742136/1743774#tool-1744612 to see the screenshot.

Rather than bringing in visitors’ captions from the start, with our inability to know their content in advance, we decided to use pre-rendered captions by the artist Sullivan Patten to help visitors settle into the intimate mental space (the cognitive toilet). These pre-rendered captions were used to supplement the visitors’ vocal captions throughout the audio journey.

 

They consisted of the sounds we make in everyday conversation that usually disappear in our written texts (e.g., ‘hmmm’, sighs, and uses of the word ‘like’), as well as phrases describing emojis, which usually exist only in the textual domain and sound unfamiliar when read out (e.g., ‘two hearts, two hearts’). We also introduced whole phrases to provide bookmarks at certain narrative points, such as ‘Do not stand in my grave and weep, I am not there’, marking the dramatic midpoint, and the song ‘bye … bye … sorry’, at the end. These provide a consistent narrative structure for the different visitors, with their individual Instagram handles, and also provide anchor points for the caption selection algorithm even when a visitor has very few social media posts.

 

Interestingly, a number of visitors did not realize that some captions were pre-rendered, instead absorbing them into their own social media self-image, perhaps attesting to the homogenization of language caused by ‘like’ culture, search optimization engines, and character limits (Anderson 2015: 281).


How Do You Create Banality in Sound?

Just like the non-functioning, excessively pink toilet, we sought to create a type of shiny, otherworldly, something-is-wrong banality. To do this, we designed an underscore of sounds to accompany the vocal captions. These start with a simple ascending arpeggio reminiscent of a Public Service Announcement, calling the visitor into the inner sanctum of the cubicle. The first part of the journey builds on the familiar, kitsch sound of a Wurlitzer electric piano, with a pastiche Alberti bass pattern [4]. These familiar sounds and patterns are, however, made uneasy by the use of pitch slides at the ends of phrases.

 

The subtle pitch slides become large string glissandi by the middle of the audio journey, marking a dramatic point where the underscore nearly overwhelms the vocal captions. The voice becomes faster, with smaller gaps between captions, creating a sort of mental crescendo, irrespective of the actual content of the captions (which in many cases are just banal descriptions of food and the humdrum of everyday life).

 

The final section of the audio journey takes us quite literally into the world of laments and elegies, using the sound of a funereal pipe organ as the final accompaniment to the digital graveyard. Rather than recording ‘real’ instruments, we use sampled digital instruments as a parallel to the transcoding of our real lives into a digitized, compressed-bit version.

In everything that is to excite a lively convulsive laugh there must be something absurd (in which the understanding, therefore, can find no satisfaction). Laughter is an affection arising from the sudden transformation of a strained expectation into nothing.

(Kant 1790 [1911]: First Part, sec. 54)

Visitors’ captions are not filtered or ordered based on any content or emotional valence, to allow for the absurd juxtaposition of captions with the melodramatic and over-the-top underscore. As that builds to its dramatic peaks or fizzles to its melancholic ending, the nothingness of endless hashtags, text-to-speech emojis, and netspeak acronyms highlight the tragicomedy of our social media selves.

Endnotes

 

[3] Microsoft Azure offers the option of training an artificial voice through an application process, and our application was returned with a request for more details of the words to be spoken by the voice. As the intention was to use the Instagram captions of visitors, we were unable to provide a script or word list to Microsoft and therefore were denied access.↩︎

[4]  Alberti bass patterns are a kind of broken chord accompaniment commonly used in Classical or Romantic music.↩︎

Go back: Home