Speech technologies provider Loquendo, makes it possible to easily create the ultimate speech-enabled applications in many widely spoken languages, thanks to their complete product line for navigation and handheld devices, desktop PCs and telephony servers.
The company has over 30 years of R&D experience and guarantees the same wide range of languages and the same core engine in all environments. Their technology offerings include: Loquendo TTS, Loquendo ASR, Loquendo Speaker Verification and the VoxNauta VoiceXML (News - Alert) Platform.
I took some time recently to chat with Paolo Baggia, Director of International Standards at Loquendo to find out more about the company and the speech technologies market. That interview can be found HERE.
Also during our conversation, Biaggia highlighted the importance of the W3C (News - Alert) Consortium for speech technologies and discussed with TMCnet the new standards they have introduced.
The World Wide Web Consortium (W3C) includes Member organizations from across the world who work together with the public and a full-time staff, to
develop new Web standards and guidelines to help the Web to reach its full potential. The consortium believes that all Web technologies, hardware and software, that are used to access the Web should be able to be used together. Since 1994, they have published over 110 W3C Recommendation standards.
What is the W3C and what do they offer for the speech framework and standards?
W3C is a standardization consortium that is creating a set of interoperable standards. In the W3C, two groups work actively on speech frameworks and standards:
The first, the Voice Browser working group, completed the VoiceXML 2.0 and 2.1 standards in the last few years, along with many other standards that allow for easy, interoperable access to speech technologies, such as: SRGS and SISR for speech grammars; SSML for speech synthesis; PLS for both recognition and synthesis, and CCXML for call control. These standards comprise the Speech Interface Framework, created in 2000 by James A. Larson, co-chair of the VBWG. The Framework today is not only almost complete, but also widely accessible in the markets of voice platforms, IVRs and speech engines.
Another active area is the Multimodal Interaction working group, whose goal is to define standards for the creation of multimodal interfaces. This group is lead by Deborah Dahl and is currently working on producing new standards.
The consortium recently released new standards and architectural changes. Can you talk a little about them and the benefit they provide?
The most recent W3C Recommendations (the final stage of a W3C specification) are:
- PLS 1.0, Pronunciation Lexicon Specification, which makes it possible to improve the pronunciation of words through phonetic languages or by transliteration. This can be a very useful tool for improving, by using standards, speech synthesis and speech recognition performance. PLS complements and completes the Speech Interface Framework mentioned earlier.
- EMMA 1.0, Extensible MultiModal Annotation, is a rich mark-up language that provides representations of multimodal inputs, whether via voice, gesture or pen/stylus. It can be used to convey complex results including N-best alternative results for speech recognition, and word lattices (graphs of word hypotheses). EMMA will prove especially valuable for mobile device application developers by facilitating and simplifying the creation of multimodal applications which make use of multiple input types - such as speech, touch screens, stylus, etc.
How are these technologies similar to the way the Web works?
There are similarities because, as with a Web browser - and there are many, from proprietary to open source - you can browse the entire Web using the http protocol. The basis for building speech applications is therefore the same: in a voice platform, there is a VoiceXML interpreter which can be accessed by http, and a Web application which generates VoiceXML instead of HTML.
What role do VoiceXML and Voice Browsing play in improving standards?
The role of VoiceXML has been of paramount importance because it proposed a Web-based model to describe voice and DTMF applications. This idea immediately provoked a giant transformation in voice platforms to accept these languages as primary. This helped to transition from a legacy world of platforms with proprietary application development and proprietary use of speech technologies, to standard VoiceXML platforms and a standardized way to develop voice applications.
I also believe that the standards proposed by the Voice Browser working group have helped to increase adoption in the speech applications industry.