THIS MONTH, ADVERTISING giant WPP will send unusual corporate training videos to tens of thousands of employees worldwide. A presenter will speak in the recipient’s language and address them by name, while explaining some basic concepts in artificial intelligence. The videos themselves will be powerful demonstrations of what AI can do: The face, and the words it speaks, will be synthesized by software.
WPP doesn’t bill them as such, but its synthetic training videos might be called deepfakes, a loose term applied to images or videos generated using AI that look real. Although best known as tools of harassment, porn, or duplicity, image-generating AI is now being used by major corporations for such anodyne purposes as corporate training.
WPP’s unreal training videos, made with technology from London startup Synthesia, aren’t perfect. WPP chief technology officer Stephan Pretorius says the prosody of the presenters’ delivery can be off, the most jarring flaw in an early cut shown to WIRED that was visually smooth. But the ability to personalize and localize video to many individuals makes for more compelling footage than the usual corporate fare, he says. “The technology is getting very good very quickly,” Pretorius says.
Deepfake-style production can also be cheap and quick, an advantage amplified by Covid-19 restrictions that have made conventional video shoots trickier and riskier. Pretorius says a company-wide internal education campaign might require 20 different scripts for WPP’s global workforce, each costing tens of thousands of dollars to produce. “With Synthesia we can have avatars that are diverse and speak your name and your agency and in your language and the whole thing can cost $100,000,” he says. In this summer’s training campaign, the languages are limited to English, Spanish, and Mandarin. Pretorius hopes to distribute the clips, 20 modules of about 5 minutes each, to 50,000 employees this year.
The term deepfakes comes from the Reddit username of the person or persons who in 2017 released a series of pornographic clips modified using machine learning to include the faces of Hollywood actresses. Their code was released online, and various forms of AI video and image-generation technology are now available to any interested amateur. Deepfakes have become tools of harassment against activists, and a cause of concern among lawmakers and social media executives worried about political disinformation, although they are also used for fun, such as to insert Nicolas Cage into movies he did not appear in.
Deepfakes made for titillation, harassment, or fun typically come with obvious giveaway glitches. Startups are now crafting AI technology that can generate video and images able to pass as substitutes for conventional corporate footage or marketing photos. It comes as synthetic media, and people, are becoming more mainstream. Prominent talent agency CAA recently signed Lil Miquela, a computer-generated Instagram influencer with more than 2 million followers.
Rosebud AI specializes in making the kind of glossy images used in ecommerce or marketing. Last year the company released a collection of 25,000 modeling photos of people that never existed, along with tools that can swap synthetic faces into any photo. More recently, it launched a service that can put clothes photographed on mannequins onto virtual but real-looking models.
Lisha Li, Rosebud’s CEO and founder, says the company can help small brands with limited resources produce more powerful portfolios of images, featuring more diverse faces. “If you’re a brand that wanted to tell a visual story, you used to have to have a large creative team, or buy stock photos,” she says. Now you can tap algorithms to make your portfolio instead.
JumpStory, a stock photo startup in Højbjerg, Denmark, has experimented with Rosebud’s technology. It had already built a business around in-house machine learning technology that tries to curate a library containing only the most visually striking photos. Using Rosebud’s technology, JumpStory tested a feature that would allow customers to alter the face in a stock photo with a few clicks, including to change a person’s apparent ethnicity, a task that would otherwise be impractical or require careful Photoshop work.
Jonathan Low, JumpStory’s CEO, says the company chose not to launch the feature, preferring to emphasize the authenticity of its images. But the technology was impressive. “If it’s a portrait it works extremely well,” Low says. Results generally aren’t as good when faces are less prominent in an image, such as in a full-length shot, he says.
Synthesia, the London startup that powered WPP’s deepfake project, makes video featuring synthesized talking heads for corporate clients including Accenture and SAP. Last year, it helped David Beckham appear to deliver a PSA on malaria in several languages, including Hindi, Arabic, and Kinyarwanda, spoken by millions of people in Rwanda.
Victor Riparbelli, Synthesia’s CEO and cofounder, says widespread use of synthetic video is inevitable because consumers and companies have a larger appetite for video than can possibly be sated by conventional production. “We’re saying let’s remove the camera from the equation,” he says. Riparbelli says interest in his technology has grown since Covid-19 shut down many video shoots and forced some companies to launch new employee education and training schemes.
Making a video with Synthesia’s tools can take seconds. Select an avatar from a list, type the script, and click a button labeled “Generate video.” The company’s avatars are based on real people, who receive royalties based on how much footage is made with their image. After digesting some real video of a person, Synthesia’s algorithms can generate new video frames to match the movements of their face to the words of a synthesized voice, which it can create in more than two dozen languages. Clients can create their own avatars by providing a few minutes of sample footage of a person, and customize their surroundings and voices too.
Riparbelli and others working to commercialize deepfakes say they are proceeding with caution, not just rushing to cash in. Synthesia has posted ethics rules online and says that it vets its customers and their scripts. It requires formal consent from a person before it will synthesize their appearance, and won’t touch political content. Rosebud has its own, less detailed, ethics statement pledging to combat negative uses and effects of synthetic images.
Li, Rosebud’s CEO, says her technology should do more good than harm. Helping a broader range of people to compete, without large production budgets, should encourage a broadening of beauty standards, she says. Her technology can generate models of non-binary gender, as well as different ethnicities. “A lot of the users I am working with are minority brand owners who want to create diverse imagery to represent their user base,” says Li, who worked on the side as a model for more than 10 years before gaining a Berkeley PhD in statistics and machine learning and working as a venture capitalist.
Subbarao Kambhampati, an AI professor at Arizona State University, says the technology is impressive but wonders whether some Rosebud clients may use diverse, synthetic models in place of real people from minority communities. “It might lull us into a false sense of accomplishment in terms of representation without changing the ground reality,” he says.
As synthetic imagery moves into the corporate mainstream, big brands and their ad agencies will greatly influence how people experience the technology. Pretorius of WPP says his company is exploring many uses for AI-synthesized imagery, with creations so far including a Rembrandt-style portrait and digitally made models indistinguishable from real people. “We can do it technically but we’re going slowly in terms of deploying that to the market,” he says. The company’s general counsel is working on a set of ethical standards for synthetic models and other imagery, including when and how to disclose that something is not in fact what it seems.