Forget About Siri and Alexa — When It Comes to Voice Identification, the “NSA Reigns Supreme”
Americans most regularly encounter this technology, known as speaker recognition, or speaker identification, when they wake up Amazon’s Alexa or call their bank. But a decade before voice commands like “Hello Siri” and “OK Google” became common household phrases, the NSA was using speaker recognition to monitor terrorists, politicians, drug lords, spies, and even agency employees.
The technology works by analyzing the physical and behavioral features that make each person’s voice distinctive, such as the pitch, shape of the mouth, and length of the larynx. An algorithm then creates a dynamic computer model of the individual’s vocal characteristics. This is what’s popularly referred to as a “voiceprint.” The entire process — capturing a few spoken words, turning those words into a voiceprint, and comparing that representation to other “voiceprints” already stored in the database — can happen almost instantaneously. Although the NSA is known to rely on finger and face prints to identify targets, voiceprints, according to a 2008 agency document, are “where NSA reigns supreme.”
It’s not difficult to see why. By intercepting and recording millions of overseas telephone conversations, video teleconferences, and internet calls — in addition to capturing, with or without warrants, the domestic conversations of Americans — the NSA has built an unrivaled collection of distinct voices. Documents from the Snowden archive reveal that analysts fed some of these recordings to speaker recognition algorithms that could connect individuals to their past utterances, even when they had used unknown phone numbers, secret code words, or multiple languages.
Civil liberties experts are worried that these and other expanding uses of speaker recognition imperil the right to privacy. “This creates a new intelligence capability and a new capability for abuse,” explained Timothy Edgar, a former White House adviser to the Director of National Intelligence. “Our voice is traveling across all sorts of communication channels where we’re not there. In an age of mass surveillance, this kind of capability has profound implications for all of our privacy.”
Edgar and other experts pointed to the relatively stable nature of the human voice, which is far more difficult to change or disguise than a name, address, password, phone number, or PIN. This makes it “far easier” to track people, according to Jamie Williams, an attorney with the Electronic Frontier Foundation. “As soon as you can identify someone’s voice,” she said, “you can immediately find them whenever they’re having a conversation, assuming you are recording or listening to it.”
The voice is a unique and readily accessible biometric: Unlike DNA, it can be collected passively and from a great distance, without a subject’s knowledge or consent.
It is not publicly known how many domestic communication records the NSA has collected, sampled, or retained. But the EFF’s Jamie Williams pointed out that the NSA would not necessarily have to collect recordings of Americans to make American voiceprints, since private corporations constantly record us. Their sources of audio are only growing. Cars, thermostats, fridges, lightbulbs, and even trash cans have been turning into “intelligent” (that is, internet-equipped) listening devices. The consumer research group Gartner has predicted that a third of our interactions with technology this year will take place through conversations with voice-based systems. Both Google’s and Amazon’s “smart speakers” have recently introduced speaker recognition systems that distinguish between the voices of family members. “Once the companies have it,” Williams said, “law enforcement, in theory, will be able to get it, so long as they have a valid legal process.”
The former government official noted that raw voice data could be stored with private companies and accessed by the NSA through secret agreements, like the Fairview program, the agency’s partnership with AT&T.