In May 2015, The Simpsons voice actor Harry Shearer – who plays a number of key characters including, quite incredibly, both Mr Burns and Waylon Smithers – announced that he was leaving the show.
By then, the animated series had been running for more than 25 years, and the pay of its vocal cast had risen from $30,000 an episode in 1998 to $400,000 an episode from 2008 onwards. But Fox, the producer of The Simpsons, was looking to cut costs – and was threatening to cancel the series unless the voice actors took a 30 per cent pay cut.
Most of them agreed, but Shearer (who had been critical of the show’s declining quality) refused to sign – after more than two decades, he wanted to break out of the golden handcuffs, and win back the freedom and the time to pursue his own work. Showrunner Al Jean said Shearer’s iconic characters – who also include Principal Skinner, Ned Flanders and Otto Mann – would be recast.
But you’ll never stop The Simpsons. After a few months, Shearer relented and signed a new deal. The show often jokes about the replaceability of voice actors in animation, but as it pushes through its fourth decade, it’s the iconic voices behind the laughter that could pose the biggest threat to its continued presence. The actors who play Springfield’s residents are approaching retirement age – they’re mostly in their 60s or 70s, Shearer is 77 – and they might soon decide they don’t want to do it anymore. They certainly don’t need the money – between fees for new episodes and residuals from repeats of old ones, they’re sitting on tens of millions of dollars.
But maybe the producers of the show don’t actually need voice actors anymore. In a recent episode, Edna Krabappel – Bart’s long-suffering teacher, whose character was retired from the show after the death of voice actor Marcia Wallace in 2013 – was brought back for a final farewell using recordings that had been made for previous episodes.
Advances in computing power mean that you could extend that principle to any character. Deepfake technology can make convincing lookalikes from a limited amount of training data and the producers of show have thirty years worth of audio to work from. So could The Simpsons replace its voice cast with an AI?
“You could certainly come up with an episode of The Simpsons that is voiced by the characters in a believable way,” says Tim McSmythurs, a Canada-based AI researcher and media producer who has built a speech model that can be trained to mimic anyone’s voice. “Whether that would be as entertaining is another question.”
On his YouTube channel, Speaking of AI, McSmythurs recasts an iconic scene from Notting Hill with Homer playing the Julia Roberts role; Donald Trump stands in for Ralph Wiggum, and Joe Biden ties an onion to his belt, which was the style at the time.
McSmythurs built a generic AI model that can turn any text into audio speech in English. When he wants to make a new voice, he tunes the model further with two or three hours of new data of that particular person speaking, along with a text transcript. “It focuses in on what makes a Homer voice a Homer voice, and the different frequencies,” he says.
After that, it’s a matter of asking the model to generate multiple takes – each one will vary slightly – and choosing the best one for your purposes. The outputs are recognisably Homer, but they sound a little emotionally flat, as if he’s reading out something that he doesn’t really understand the meaning of. “It does depend on the training data,” McSmythurs says. “If the model hasn’t been exposed to those quite wide ranges of emotion it can’t create it from scratch. So it doesn’t sound as energetic as Homer might.”
British startup Sonantic has developed a way of bringing that emotional range to AI voices. They work with voice actors to get a wide range of training data – several hours of the actors running through different lines, with different emotional tones. “We know the difference between sarcasm and sincerity, and the tiny little clues in sound,” says John Flynn, Sonantic co-founder and CTO. “We stretch those natural points and nuances and inflections.”
The amount of training data required has decreased drastically, Flynn says, from 30 to 50 hours down to just ten or 20 minutes. Brisbane-based Replica Studios has built a model that can be trained to recreate a voice simply by being fed recordings of 20 short but specific sentences. “The more data you have the more performance you can get, but we can do something in a couple of minutes,” says Shreyas Nivas, Replica co-founder and CEO.
Words are constructed from syllables, which are built from phonemes – all the individual sounds that your mouth is able to make. In theory, a training model could get everything it needed from a single sentence known as a phonetic pangram, which contains every phoneme of English, although in practice this varies depending on your accent. (For example, try thinking of all the different ways there are to say: “The beige hue on the waters of the loch impressed all, including the French queen, before she heard that symphony again, just as young Arthur wanted.”)
Voice generation technology is already finding a use in video games – Sonantic is working with Obsidian, the makers of Fallout and The Outer Worlds, while Replica has a number of AAA and indie games studios as clients. In games, AI voices can be used to fill out an open world with a much wider range of conversations, instead of characters being limited to saying things that were recorded by a voice actor in a studio.
Nivas says the technology is particularly useful in the development stage, where an AI version of the voice can be used as a stand-in that enable the creators of the game to try out various options before getting the real actor in. It could also be used to drive increased customisation – commentators screaming your actual name on games like FIFA could be one application, while Replica developed a mod for Cyberpunk that changes the main character’s name, and enables every character that interacts with them to say it. Combining AI voice generation, speech recognition, and a text-to-speech algorithm like GPT-3 could mean players can actually converse with non-player characters, with dialogue that is generated right there and then.
However, unless Fox decides to handover scriptwriting and animation to an AI too, you wouldn’t need any of those features for something scripted like The Simpsons. And in fact, using an AI to recast a character would probably be more trouble than just finding someone who can do a pretty good Homer impression. “If the goal is to produce another episode of the show, the best way would be to get the acting cast together with a script and have them perform it – they would deliver a higher quality performance because they have been doing so successfully for decades and they can embody the characters perfectly,” says Nivas. “Using an AI voice actor would require more iterations and more work than just reassembling the cast.”
There’s also a legal minefield to navigate for any producer seeking to recast unruly voice actors with an AI. “This area of the law is thorny,” says Jennifer Rothman, a law professor at the University of Pennsylvania, and the author of The Right of Publicity: Privacy Reimagined for a Public World.
On the one hand, contracts may limit what the studio is allowed to do with the recordings. Added to that are collective bargaining issues – the actors union SAG-AFTRA has, Rothman says, “been very active in trying to regulate the reanimation and reuse of both voice actors and on-screen actors”.
However, in the absence of any contractual stipulations, copyright law comes into play. “Whoever owns the copyright to The Simpsons would hold all of the rights to reproduce the copyrighted works they’ve already made – including the captured recordings of the actors’ performances, and the right under copyright law to make derivative works,” Rothman says.
But this clashes with another set of laws governing the right to publicity, which varies across the United States. “This right of publicity gives the right to the performers to control unauthorised uses of their names, likenesses, performances and often also their voice,” Rothman says.
There’s also, says Johanna Gibson – a professor of intellectual property law at Queen Mary, University of London – a potential recourse for the actors in a false endorsement claim. If The Simpsons used a deepfake Homer to advertise chocolate bars, it could be seen as a personal endorsement by the actor Dan Castellaneta. The law could also, Gibson says, vary even between different characters played by the same actor on the same show – she uses the example of Seth Macfarlane from Family Guy, whose ‘Brian’ voice is his actual speaking voice and is likely to have more protections, while Stewie is a voice created specifically for the show. (Of course in this instance, Macfarlane is the creator of the show and is unlikely to be replaced by an AI against his will).
In 1993, two actors from Cheers – George Wendt and John Ratzenberger – sued Paramount for using their likenesses for robotic versions of their characters used in airport bars. The actors argued that the right to publicity gave them control of their own image, the studio argued that copyright law allowed them to create derivative works based on the sitcom. The case dragged through the courts for eight years and the studio eventually settled for an undisclosed fee. “The law is unclear, which suggests that if the contract doesn’t say the studio can do it then it is uncertain how such disputes would come out if litigated,” says Rothman. “It’s an unresolved issue. The legal framework for resolving these cases is quite a mess.”
But voice actors probably don’t need to get on the phone to their lawyers just yet. None of the people making these voice generation tools are doing so with the purpose of replacing actors. Both Sonantic and Replica are keen to stress that they work with actors, and that they have revenue-sharing models in place so that the voice actors make money every time their ‘voice’ is used in a game.
As this technology improves and the voices it creates move out of the “uncanny valley”, they could, says Nivas, help democratise content creation – allowing fans of The Simpsons to legally use the voices of their favourite characters for their own projects, for instance, to make mashups and remixes that breathe new life into a tired show.
Zeena Qureshi, the CEO and co-founder of Sonantic, likens current voice generation tech to the early days of CGI. “It replicates an actors voice but it’s not going to replace them,” she says. “CGI didn’t replace cinematographers, this isn’t going to replace actors, but it helps them work in person and virtually. If someone retires their voice can work for them.”
McSmythurs also draws a comparison with CGI, and says that although you could make a convincing episode of The Simpsons today (with a lot of iteration and effort), it might struggle to stand the test of time – in the same way that CGI films from the 90s look dated to modern eyes. He sees a use of the technology for short snippets – things like reviving a character played by a deceased actor for a final farewell, but doesn’t think an AI cast will be a practical route any time soon. “The voice actors are bringing more to it than just a voice, they’re bringing that emotional content,” he says. “Dan Castellaneta imbues this 2D character with warmth, depth and all the qualities that make us like him. Humans do a very good job of being human.”