When you think of interactive voice response (IVR) systems, one of the first things that usually come to mind is text-to-speech (TTS) capabilities. According to this Plum Voice blog, they are a “critical component of a majority of IVR systems because they allow for natural interactions between the speaker and the voice application.” TTS is indeed a very important part of IVR systems today, as it allows users to vocalize as opposed to enter them manually via a touch-tone keypad; however, this development isn’t the end-all for seamless IVR interaction. The pronunciation in TTS can sometimes be way off, but this – like aspects of every form of technology – is realistically expected. The question is, could language-specific optimization work to put a real end to it all?
“Developers are constantly working to correct these mispronunciations and errors, but with millions of words present in the English lexicon alone (and new words being added daily) it is impossible to correct every possible mispronunciation,” says Plum. “Add that to the fact there are a huge number of languages that businesses would like to integrate with TTS systems, and correct pronunciation becomes an even more complex proposition.”
Language-specific optimization works to control this by formulating words based on the phonetic alphabet, but many still seem skeptical. It can be a very complex question when you think about it: Could a TTS engine be integrated to produce natural-sounding speech by feeding the system text that is formatted phonetically?
Plum proposes this challenging question in its blog, and CEO Andrew Kuan has the answer.
One thing is for sure, and that’s that in order for IVR to evolve further, TTS speech must sound more natural or normal. Kuan, however, notes that “just because you can notate the pronunciation of a word doesn’t mean that it will result in a natural reproduction of how it would be spoken by a native speaker.”
The point he makes is very interesting, and quite reflective of an article previously published about how pronunciation and geographical dispersion lie parallel to one another. Widespread voice talent, Ardeth Ohm-Moser, elaborates on this in the article, saying that if she were to say certain cities such as “Louisville” or “Hawaii,” she’d be instructed to pronounce it according to who her audience would be. Take a look for yourself below at this related video.
Essentially, language-specific optimization could really just be a quick-fix at its best, as Kuan adds that while individuals may be able to extract correct pronunciation through the phonetic alphabet, syntax and other inflections must still be considered and worked out, which are both equally significant components to natural speech.
“The more relevant consideration when trying to identify a highly functioning TTS system is looking at its components, including whether or not it has a robust dictionary and a high functioning text parser,” writes Plum. “If an engine posses these qualities, it is likely that it is a good TTS system that can function without much language-specific optimization or other superfluous input.”
To stay in-the-know about everything IVR, be sure to follow Plum Voice on Twitter (News - Alert) @PlumVoice.
Want to learn more about the latest in communications and technology? Then be sure to attend ITEXPO West 2012, taking place Oct. 2-5, in Austin, TX. Stay in touch with everything happening at ITEXPO (News - Alert). Follow us on Twitter.
Edited by Amanda Ciccatelli