In the opening session of his 2020 introductory course on deep learning, Alexander Amini, a PhD student at the Massachusetts Institute of Technology (MIT), invited a famous guest: former US President Barack Obama.
“Deep learning is revolutionizing so many fields, from robotics to medicine and everything in between,” said Obama, who joined the class by video conference.
After speaking a bit more on the virtues of artificial intelligence, Obama made an important revelation: “In fact, this entire speech and video are not real and were created using deep learning and artificial intelligence.”
Amini’s Obama video was, in fact, a deepfake—an AI-doctored video in which the facial movements of an actor are transferred to that of a target. Since first appearing in 2018, deepfake technology has evolved from hobbyist experimentation to an effective and dangerous tool. Deepfakes have been used against celebrities and politicians and have become a threat to the very fabric of truth.
How Do Deepfakes Work?
Deepfake applications work in various ways. Some transfer the facial movements of an actor to a target video, such as the one we saw at the beginning of this article, or this Obama deepfake created by comedian Jordan Peele to warn about the threat of fake news:
Other deepfakes map the face of a target person onto other videos—for example, this video of Nicolas Cage’s face mapped onto that of characters in different movies.
Like most contemporary AI-based applications, deepfakes use deep neural networks (that’s where the “deep” in deepfake comes from), a type of AI algorithm that is especially good at finding patterns and correlations in large sets of data. Neural networks have proven to be especially good at computer vision, the branch of computer science and AI that handles visual data.
Deepfakes uses a special type of neural-network structure called an “autoencoder.” Autoencoders are composed of two parts: an encoder, which compresses an image into a small amount of data; and a decoder, which decompresses the compressed data back into the original image. The mechanism is similar to those of image and video codecs such as JPEG and MPEG.
Image courtesy of Two Minute Papers
But unlike classical encoder/decoder software, which work on groups of pixels, the autoencoder operates on the features found in images, such as shapes, objects, and textures. A well-trained autoencoder can go beyond compression and decompression and perform other tasks—say, generating new images or removing noise from grainy images. When trained on images of faces, an autoencoder learns the features of the face: the eyes, nose, mouth, eyebrows, and so on.
Deepfake applications use two autoencoders—one trained on the face of the actor and the other trained on the face of the target. The application swaps the inputs and outputs of the two autoencoders to transfer the facial movements of the actor to the target.
What Makes Deepfakes Special?
Deepfake technology isn’t the only kind that can swap faces in videos. In fact, the VFX (visual effects) industry has been doing this for decades. But before deepfakes, the capability was limited to deep-pocketed movie studios with access to plentiful technical resources.
Deepfakes have democratized the capability to swap faces in videos. The technology is now available to anyone who has a computer with a decent processor and strong graphics card (such as the Nvidia GeForce GTX 1080) or can spend a few hundred dollars to rent cloud computing and GPU resources.
That said, creating deepfakes is neither trivial nor fully automated. The technology is gradually getting better, but creating a decent deepfake still requires a lot of time and manual work.
First, you have to gather many photos of the faces of the target and the actor, and those photos must show each face from different angles. The process usually involves grabbing thousands of frames from videos that feature the target and actor and cropping them to contain only the faces. New deepfake tools such as Faceswap can do part of the legwork by automating the frame extraction and cropping, but they still require manual tweaking.
Training the AI model and creating the deepfake can take anywhere from several days to two weeks, depending on your hardware configuration and the quality of your training data.
The Dangers of Deepfakes
Creating fun educational videos and custom casts for your favorite movies are not the only uses of deepfakes. AI-doctored videos have a darker side that has become much more prominent than its positive and benign uses.
Shortly after the first deepfake program was released, Reddit became flooded with fake pornography videos that featured celebrities and politicians. In tandem with deepfakes, the development of other AI-powered technologies have made it possible not only to fake the face but also the voice of virtually anyone.
The rise of deepfakes has caused other worries as well. Here’s a timely one: If anyone can use technology to create fake porn, what prevents bad actors from spreading fake videos of politicians making controversial remarks?
With reports of how social media algorithms expedite the spread of false information, the threat of a fake-news crisis triggered by deepfake technology has become a serious concern, especially as the US prepares for the 2020 presidential elections. US lawmakers have flagged deepfakes as a threat to national security and have held several hearings on the possible misuses of the technology to influence public opinion through disinformation campaigns. And we’ve seen a raft of legislative measures to ban deepfakes and hold the people who create and distribute them to account.
The Fight Against Deepfakes
Earlier deepfakes contained visual artifacts that were visible to the naked eye, including unnatural eye blinking and abnormal skin color variations. But deepfakes are constantly improving.
Researchers have been devising new techniques to detect deepfakes only to see them become ineffective as the technology continues to evolve and yield more natural results. So as the 2020 presidential elections close in, major tech companies and government agencies have been racing to counter the spread of deepfakes.
In September, Facebook, Microsoft and several universities launched a competition to develop tools that can detect deepfakes and other AI-doctored videos. “This is a constantly evolving problem, much like spam or other adversarial challenges, and our hope is that by helping the industry and AI community come together we can make faster progress,” Facebook CTO Michael Schroepfer wrote in a blog post that introduced the Deepfake Detection Challenge. The social media giant has allocated $10 million to the industry-wide effort.
DARPA, the research arm of the Department of Defense, has also launched an initiative to curb the spread of deepfakes and other automated disinformation attacks. In addition to detecting doctored videos and images, DARPA will be looking for ways to facilitate attribution and identification of the parties involved in the creation of fake media.
Other efforts at universities and research labs range from using deep learning to detect modified areas in images to using blockchain to establish a ground truth and register trustable videos.
But all in all, researchers agree that the fight against deepfakes has become a cat-and-mouse chase. As one researcher told me last year, “Whatever we do, people who create those manipulations come up with something else. I don’t know if there will ever be a time where we will be able to detect every kind of manipulation.”