March 22, 2012
When Voice Recognition Applications are Bad, Don't Blame the Speech Engine
Lots of people are excited about speech technology nowadays: for good reason. While speech recognition software has been used for consumer applications for a few decades now (think of interactive voice response – IVR – that uses voice input instead of “press one” and “press two”), newer applications are changing the way we interact with technology. Apple's (News - Alert) Siri personal assistant has brought the technology rushing to the forefront, as have automobile-based telematics systems like OnStar and others. Since we, as a nation, are always on the go and looking for the easiest possible interface, speech technology is likely to begin showing up in places it never was before: in our home entertainment systems, in video gaming, in security systems, in more moderately priced cars, in toys and in our homes. (As in, “turn on the living room lights and raise the air conditioning temp to 72, please.”)
It's important to understand that there are two aspects of speech technology. A recent interview in The Motley Fool with Jim Greenwell, CEO of a small speech application company called Datria (News - Alert), emphasizes this “dual-process” nature of the voice interaction industry. There is the speech engine itself, a complex platform programmed to recognize the human voice, then there are the applications that are built on top of the speech engine. Datria uses the speech engine offered by industry leader Nuance (News - Alert) Communications (which is also behind Apple's speech capabilities) on which to build its applications.
Said Greenwell, “Nuance clearly has the best [speech] engine out there.” He notes that it works in more than 60 languages and has underlying linguistic algorithms to recognize small “phonetic sound bites,” which vary greatly between languages.
While speech engines take decades to build (in other words, few companies are going to start the process today and expect to see success with it by next year), speech applications are becoming numerous. The problem, says Greenwell, is that many speech applications aren't very good, and it lends a perception to voice recognition technology that it's ineffective. This isn't the case, he says. It's the applications that can be faulty.
“Many...companies license Nuance's engine, and the recognition side works great, but if the application (which is frequently built in-house) isn't programmed to interpret all the ways you can say 'yes,' then it might not realize that 'absolutely' means the same thing,” notes Greenwell. “This leaves many users with an impression that voice-recognition technology is broken, when it's really the application software layer that needs improvement.”
It's this disjointed experience that the industry needs to address, Greenwell told The Motley Fool.
Edited by Juliana Kenny