Do you hear voices? [multi’vocal] and Syb might have an answer to what’s behind them
AI voice services shape our perception of the world, and they are here to stay. While most of these systems, such as Amazon Alexa, Cortana and Siri, are produced by tech giants and enhance a binary worldview, The New New projects [multi’vocal] by Frederik Tollund Juutilainen & Stina Hasse and Syb — Queering voice AI by Andrew Mallinson & Cami Rincòn take a different approach.
Our first question is our toughest: How would you describe your project to people who don’t know much about synthetic voices?
Frederik ([multi’vocal]): [multi’vocal] is an ongoing art and research project. Its intention is to question and explore the politics and aesthetics of synthetic speech, as well as its sonic qualities, which include how voice relates to questions of age, gender and geographical region.
Stina ([multi’vocal]): We also look at the paralanguage of synthetic speech — the tonal qualities that are not necessarily concerned with the words being said, but rather how it is being said.
Not everyone will know what synthetic speech is. Could you describe it with a concrete example?
Frederik ([multi’vocal]): Synthetic speech is speech that hasn’t been recorded before. We start with a text input from which one can generate audio — most often using deep neural networks. Most of the time, synthetic speech models are trained with a large number of recordings from a single voice actor, which is done to make something that’s similar or akin to the voice of that specific voice-actor. Our approach is different in the sense that we continuously integrate new speakers into the process — and thus give the algorithm a hard time by presenting new ways of seeing a given string or sentence.
What issues motivated and inspired you to start your project?
Stina ([multi’vocal]): One of the things that made us start wondering was listening to Siri, Alexa and Cortana. All these female-identified synthetic voices are given names with female identities. The representation of human voices is deeply shaped by those binary ideas. It made us curious to find out why — and to investigate if it could be any different.
Cami & Andrew, how would you describe your project shortly?
Andrew (Syb — Queering voice AI): Syb is a voice interface created by and for trans and non-binary people that recommends or connects them to media created by their community. In our very first workshop, it became clear that we wanted to centre the idea of trans joy. Too often, discourses and conversations around trans people, specifically in the wider media, tend to center their trauma.
Cami (Syb): The purpose of it is to support the embodied wellbeing of trans and non-binary people. Not only that — it should also be a use case in how to uplift trans and non-binary people and redress the structural imbalances and injustices they experienced. Finding out more about this was part of my research and the interviews that I conducted.
Could you tell us more about the participants and how you started the process?
Cami (Syb): Initially, the workshop built on the research that I did for my master’s thesis, which has now been published as “Speaking from experience, trans and non-binary requirements from voice activated AI”. Through research and interviews, what I found was that, beyond representation, the most salient needs of the trans people I interviewed had to do with surveillance, capitalism and privacy. Coming from this, we established workshops with three requirements: relevant use cases, competent representation (meaning truly separating the body which the voice is from and the gender), and a community-driven horizontal process. Everything we see today in our project Syb is driven by this process. It was special as it really took off, with a lot of trans people participating.
Andrew (Syb): The week of the workshops, which was an open call to the whole Creative Computing Institute in London, was probably one of the most intense of my life. It was in equal parts extremely rewarding, joyful, extremely demanding, busy and hectic. What was so interesting was to create a space centering on trans and non-binary peoples’ needs. They want to occupy this space as a collective of people. It was just a really thrilling process.
Cami (Syb): Another thing to add here is that we consulted a trans designer and co-created personas for the workshops with her. We did this in contrast to mainstream productions — which want to be inclusive, but are grounded on the views of cis-people.
Let’s talk about your design process. Who participates and what does the [multi’vocal] process look like? Did anything Cami and Andrew have shared about their own process and the tool they are building resonate with you?
Frederik ([multi’vocal]): We are five people in our collective, coming from very mixed, interdisciplinary backgrounds and our project is also a kind of vessel for doing many related things. One of the initial ideas, which was a bit more romantic, was to actually make a synthetic voice. We were like: How tricky can it actually be? And then you find out that it’s complicated, and there’s a lot of stuff to it. But one of the first big things we did was to collect data through different design interventions from anonymous people reading sentences aloud — 1,400 sentences cover most of the phonemes in the English language. When a lot of Danish people, where we are based, are trying to speak English, the performativity of language and speech becomes even more apparent. It makes you reflect on being a consumer of voices, which I think was a big thing.
Stina ([multi’vocal]: A lot of synthetic voices are recorded by one voice actor in a closed setting and we had several Aha! moments where we confronted the idea of the binary, accent recognition and paralanguage. It’s why we collected voices at festivals, where we had access to a lot of different accents and ages. We’ve done an audio paper on the methodologies we used, which also highlights some biases in relation to how synthetic voices are trained due to similarities, and the difficulties in getting a proper outcome with these differences between voices.
We are all standing on the shoulders of giants. Who do you look up to when you start your design process?
Cami (Syb): When I was doing my research, there’s a couple of people who come to mind. The first, who really came up again and again, is Dean Spade, a lawyer. He focuses on law and critical trans politics as well as the limits of law and violence within administrative processes, which I think can be very well translated to the issue of data violence — specifically, to trans people and violence that occurs within datasets and data systems. One of the things he talks about is a threefold approach to activism:
- Lots of approaches are abolish-oriented because they are fundamentally unethical and incompatible with liberation.
- Other approaches are more reformist-oriented and want to work within some systems that aren’t fundamentally harmful — but they cause harm to make these better.
- It’s also possible to create alternatives.
Thinking strategically, design requirements that the industry should take into account start with privacy as the most salient, followed by other things, such as non-consensual data collection, which should be abolished. When we think about alternatives, that’s not something which realistically is coming from big tech — which is why we want to work on creating those alternatives, little utopias, and imaging new worlds. That geared the direction for Syb.
Andrew (Syb): Talking about being in rooms, where it’s not necessarily for people with this specific background in tech, that’s really the position that I come from. I’m not like that in the slightest. I studied sculpture at university, I graduated four years ago, my background is fine art. A lot of my education and teaching was around how to consider space politically, socially, ethically. These are the questions that informed the way that I perceive the world and the work that I do. And that was my mentality and viewpoint when I came on board. So thinking about people who seem to me really important, there’s Paul B. Preciado, who is a trans philosopher, who is incredible and really changed my perspective on how I approach technology. All the learning that I did on a fine art course, the work of the feminist internet and my interest in feminism, in queer theory — all of these things just tie into this idea of how we can imagine and speculate on more interesting, ethical, better futures. We should all be able to contribute to the ethical, socially-conscious production of technology and what can be voiced about futures.
Frederik ([multi’vocal]): Just to add to what Andrew was saying. One thing that I found fascinating doing this project is the idea of being in technology and how you approach it. It becomes very blurred and it’s also something that’s always so contextually-defined. I enter some spaces where I’m totally from, and there are others I am completely disconnected from. You can burn your fingers there. But it’s really fun to approach elements of technologies as a co-creator, because everybody’s just fumbling around. Maybe it’s a bit naïve, but if everybody is in that boat, it’s also a matter of just being content with fumbling and having the attitude towards it — and I kind of like that.
Stina ([multi’vocal]): There’s value in creating multiple alternatives to these tech monopolies. Otherwise, what we hear in public spaces, in synthetic voices, and synthetic voice design is all the same. That’s why we experiment with open source instead of using black boxes as the tech giants do. To come back to our inspirations, Lawrence Abu Hamdan has done some amazing work on voice and listening. His 2012 project “The Freedom of Speech Itself” is an audio documentary which through sound and forensic voice analysis critically investigates tools that claim to determine the origins and authenticity of asylum seekers’ accents. One of the things that makes his work so brilliant is that he is revealing how accent and voice are political. It is a politics not only of speaking, but also of listening — and it has been used wrongfully in so many ways. This shows again that technology is, of course, not neutral. And of course, there is Holly Herndon, who is mentoring us, which we’re very excited about. She is a great inspiration because she looks into voice design from a really artistic and sensitive way.
Who do you reach out to with your work?
Andrew (Syb): At one of our workshops, we were confronted with this question of who we are doing this for. We came to the conclusion that it is designed by and for trans and non-binary people and communities, while at the same time being for anyone who wants to expand their understanding and knowledge of gender expansiveness.
Frederik ([multi’vocal]): One of the big things that we are doing right now is building voices for a record release that will come out this fall. A big part of what’s driving that is like what Andrew is saying — anybody who wants to participate. We’re also trying to make it more approachable and engaging for people that might not come from an academic or machine learning or any other kind of background. One of the ways that we approach this is through research that we’ve done showing the absurdity of training algorithms to speak like humans, and the glitches that appear in the way that we interact.
Stina ([multi’vocal]): We probably have a pretty broad group of people that we want to reach. One of our aims in participating in all these festivals, having our voice box installed at many universities and organizations of different kinds, is to reach a really broad group of people and have them reflect upon what kinds of voices do they want in the future and actually start getting them thinking how it could be different.
What will you take away from The New New fellowship, and how does it help you to achieve your project goals?
Frederik ([multi’vocal]): There are many elements to [multi’vocal] and we also have our other jobs going on. So just the fact that there are people who find this important and intriguing and worth listening to, adds a lot, especially when you get corona-pigeonholed into something where you’re trying to find meaning in the process. It’s also very nice that we get some funding, because it means somebody believes in it. Moreover, the meetings we’ve had on zoom and hearing about the work and the ideas of the other fellows. Some are very closely connected to us, even if they’re in a completely different realm, and you get inspired by their way of working. That’s something that I didn’t see coming.
Cami (Syb): I don’t want to speak for Andrew, but we have talked about that exact thing. Getting funding was obviously great, it was very exciting for us to be able to take Syb into a second iteration. Getting to know the other fellows is another benefit. A lot of their projects are quite different, but there seem to be common threads, at least ideologically speaking, when it comes to digital spaces and digital futures. Those ideologies, those narratives, are not the dominant ones. So it’s been really inspiring to connect and network with other people and just to see the creative ways that people are coming up with alternatives.
Andrew (Syb): I guess I’ll just echo what everyone said. It feels like there’s a set of shared values that have arisen within the meetings. Because the projects are so different, everyone is moving along different paths, but there’s this communal goal of trying to do something better. There’s a sense of community that I wasn’t expecting. I really, really naively thought it’d be like other schemes where it’s like, “Here’s your money, good luck, now go”. But it’s not like that at all. That’s incredible. I talk to friends quite often about the fellowship, and people I work with, and I really sing its praises. It’s been really lovely.
Stina ([multi’vocal]): Just to hear about these other projects that are so inspiring and similar — and then yet so different — is great. And, you know, mind-blowing and giving so much inspiration. But also the carefulness with which you have been asking, “Okay, but what do you need in order to progress?” and “What can we do?” It’s really a continuous caring for the project, which really is wonderful.