In the Age of A.I., Is Seeing Still Believing ? | The New Yorker
In a media environment saturated with fake news, such technology has disturbing implications. Last fall, an anonymous Redditor with the username Deepfakes released a software tool kit that allows anyone to make synthetic videos in which a neural network substitutes one person’s face for another’s, while keeping their expressions consistent. Along with the kit, the user posted pornographic videos, now known as “deepfakes,” that appear to feature various Hollywood actresses. (The software is complex but comprehensible: “Let’s say for example we’re perving on some innocent girl named Jessica,” one tutorial reads. “The folders you create would be: ‘jessica; jessica_faces; porn; porn_faces; model; output.’ ”) Around the same time, “Synthesizing Obama,” a paper published by a research group at the University of Washington, showed that a neural network could create believable videos in which the former President appeared to be saying words that were really spoken by someone else. In a video voiced by Jordan Peele, Obama seems to say that “President Trump is a total and complete dipshit,” and warns that “how we move forward in the age of information” will determine “whether we become some kind of fucked-up dystopia.”
“People have been doing synthesis for a long time, with different tools,” he said. He rattled off various milestones in the history of image manipulation: the transposition, in a famous photograph from the eighteen-sixties, of Abraham Lincoln’s head onto the body of the slavery advocate John C. Calhoun; the mass alteration of photographs in Stalin’s Russia, designed to purge his enemies from the history books; the convenient realignment of the pyramids on the cover of National Geographic, in 1982; the composite photograph of John Kerry and Jane Fonda standing together at an anti-Vietnam demonstration, which incensed many voters after the Times credulously reprinted it, in 2004, above a story about Kerry’s antiwar activities.
“In the past, anybody could buy Photoshop. But to really use it well you had to be highly skilled,” Farid said. “Now the technology is democratizing.” It used to be safe to assume that ordinary people were incapable of complex image manipulations. Farid recalled a case—a bitter divorce—in which a wife had presented the court with a video of her husband at a café table, his hand reaching out to caress another woman’s. The husband insisted it was fake. “I noticed that there was a reflection of his hand in the surface of the table,” Farid said, “and getting the geometry exactly right would’ve been really hard.” Now convincing synthetic images and videos were becoming easier to make.
The acceleration of home computing has converged with another trend: the mass uploading of photographs and videos to the Web. Later, when I sat down with Efros in his office, he explained that, even in the early two-thousands, computer graphics had been “data-starved”: although 3-D modellers were capable of creating photorealistic scenes, their cities, interiors, and mountainscapes felt empty and lifeless. True realism, Efros said, requires “data, data, data” about “the gunk, the dirt, the complexity of the world,” which is best gathered by accident, through the recording of ordinary life.
Today, researchers have access to systems like ImageNet, a site run by computer scientists at Stanford and Princeton which brings together fourteen million photographs of ordinary places and objects, most of them casual snapshots posted to Flickr, eBay, and other Web sites. Initially, these images were sorted into categories (carrousels, subwoofers, paper clips, parking meters, chests of drawers) by tens of thousands of workers hired through Amazon Mechanical Turk. Then, in 2012, researchers at the University of Toronto succeeded in building neural networks capable of categorizing ImageNet’s images automatically; their dramatic success helped set off today’s neural-networking boom. In recent years, YouTube has become an unofficial ImageNet for video. Efros’s lab has overcome the site’s “platform bias”—its preference for cats and pop stars—by developing a neural network that mines, from “life style” videos such as “My Spring Morning Routine” and “My Rustic, Cozy Living Room,” clips of people opening packages, peering into fridges, drying off with towels, brushing their teeth. This vast archive of the uninteresting has made a new level of synthetic realism possible.
In 2016, the Defense Advanced Research Projects Agency (DARPA) launched a program in Media Forensics, or MediFor, focussed on the threat that synthetic media poses to national security. Matt Turek, the program’s manager, ticked off possible manipulations when we spoke: “Objects that are cut and pasted into images. The removal of objects from a scene. Faces that might be swapped. Audio that is inconsistent with the video. Images that appear to be taken at a certain time and place but weren’t.” He went on, “What I think we’ll see, in a couple of years, is the synthesis of events that didn’t happen. Multiple images and videos taken from different perspectives will be constructed in such a way that they look like they come from different cameras. It could be something nation-state driven, trying to sway political or military action. It could come from a small, low-resource group. Potentially, it could come from an individual.”
As with today’s text-based fake news, the problem is double-edged. Having been deceived by a fake video, one begins to wonder whether many real videos are fake. Eventually, skepticism becomes a strategy in itself. In 2016, when the “Access Hollywood” tape surfaced, Donald Trump acknowledged its accuracy while dismissing his statements as “locker-room talk.” Now Trump suggests to associates that “we don’t think that was my voice.”
“The larger danger is plausible deniability,” Farid told me. It’s here that the comparison with counterfeiting breaks down. No cashier opens up the register hoping to find counterfeit bills. In politics, however, it’s often in our interest not to believe what we are seeing.
As alarming as synthetic media may be, it may be more alarming that we arrived at our current crises of misinformation—Russian election hacking; genocidal propaganda in Myanmar; instant-message-driven mob violence in India—without it. Social media was enough to do the job, by turning ordinary people into media manipulators who will say (or share) anything to win an argument. The main effect of synthetic media may be to close off an escape route from the social-media bubble. In 2014, video of the deaths of Michael Brown and Eric Garner helped start the Black Lives Matter movement; footage of the football player Ray Rice assaulting his fiancée catalyzed a reckoning with domestic violence in the National Football League. It seemed as though video evidence, by turning us all into eyewitnesses, might provide a path out of polarization and toward reality. With the advent of synthetic media, all that changes. Body cameras may still capture what really happened, but the aesthetic of the body camera—its claim to authenticity—is also a vector for misinformation. “Eyewitness video” becomes an oxymoron. The path toward reality begins to wash away.