In a post-Snowden world, surveillance is top of mind for many — as is the growing tsunami of cybercrime. Establishing secure VoIP communications is a crucial task necessary to prevent eavesdropping and surveillance that can make confidential information vulnerable, as well as man-in-the-middle (MitM) attacks that can either intercept messages or inject fabricated ones. So-called “crypto phones” claim to prevent against these compromises, but new research suggests that they aren’t as airtight as advertised.
“Given the surge in popularity of computing devices, ensuring the security of VoIP connections is very important for personal users, and especially for business users,” said team leader Nitesh Saxena, Ph.D., associate professor of CIS, a member of the Center for Information Assurance and Joint Forensics Research (CIA|JFR), and the director of the University of Alabama at Birmingham Security and Privacy in Emerging computing and networking Systems (SPIES) research group.
According to the UAB research, traditional means of end-to-end security for VoIP require a dedicated infrastructure and the use of things like public keys, which may impose unwanted trust onto third parties. In contrast, crypto phones, like those offered by PGPfone and Zfone (News - Alert), use a peer-to-peer mechanism based on cryptographic protocols and a do-it-yourself security approach. Essentially, these employ what’s known as Short Authenticated Strings (SAS (News - Alert)), validated by end users using their voices.
The problem is that human users can be easily fooled by mimicry.
Through a project funded by Cisco (News - Alert) Systems, researchers in the Department of Computer and Information Sciences at the university demonstrated that it was possible to compromise the security of crypto phones, in both two-party and multi-party settings, with automated SAS voice imitation attacks.
There are a couple of different ways to accomplish this. The first is called a “short voice reordering attack,” and involves building arbitrary SAS strings in a victim’s voice by reordering previously eavesdropped SAS strings spoken by the victim. The second attack, called the “short voice morphing attack,” builds arbitrary SAS strings in a victim’s voice from a few previously eavesdropped sentences (less than 3 minutes) spoken by the victim.
Researchers were able to design and implement the attacks using simple, off-the-shelf speech recognition/synthesis tools, finding them effective against three prominent forms of SAS encodings for voice-only calls.
Saxena’s research also confirmed that if the attacker performs the voice impersonation against SAS, users may not be able to detect this attack by looking at and analyzing the accompanying video of the call. In theory, the video component should show that the lip movement of the person voicing the SAS does not match what’s coming over the audio; but most users in the study either didn’t look at the video or couldn’t detect the mismatch between the audio and the video.
Fortunately, Saxena’s team also sought to identify potential solutions to those threats that could help increase the security of the underlying SAS validation process. One potential defense to these attacks could be integration of an automated voice-recognition or voice-biometrics system into crypto phones. That is, in place of, or in addition to, human voice recognition, a software component may be used to detect potential SAS forgeries.
Yet another potential solution to thwart the voice impersonation attacks is to perform the SAS validation over an auxiliary channel that can be more resistant to voice and packet manipulation. For example, if the communicating devices support both Internet connection and cellular connection, the non-SAS communication can take place over the former and SAS validation can take place over the latter. This solution is suited for use on mobile phones in particular.
While these potential solutions could serve as a useful defense to these attacks, they are not completely foolproof. Saxena’s team contends that a comprehensive investigation in the future is needed to better address a viable mechanism that could thwart such attacks.
“We believe our findings from this project will make strong impacts — not only on networking security, but also on human-computer interaction and real-world usability,” said Maliheh Shirvanian, the Ph.D. student who led the project. “The results bring to light the threats of conceived voice privacy, and should serve as notice to users to pay careful attention to the potential security weaknesses in the future.”
Edited by Rory J. Thompson