New speech technologies making noise

A key standard for building speech-based telephony applications, VoiceXML 2.0, has received a final nod of approval from the World Wide Web Consortium.

The standard's official graduation comes just days before Microsoft is expected to formally launch its Speech Server products - which adhere to a competing standards effort - at the SpeechTEK conference in San Francisco.

The W3C advanced VoiceXML 2.0, along with the supporting Speech Recognition Grammar Specification (SRGS), to final "recommendation" status, effectively making them Web standards. These are the most mature of a handful of specifications in the W3C's evolving Speech Interface Framework.

The Speech Interface Framework aims to define a set of standards for building applications that let people interact with Web-based services over a telephone. The applications use a variety of voice-based interfaces that range from keypads and spoken commands to music and synthetic speech. Within the framework, VoiceXML controls how a voice application interacts with a user. Developers use SRGS to describe the words and phrases that end users are expected to give in response to spoken prompts.

Other elements of the framework include Speech Synthesis Markup Language (SSML), which is used for creating spoken prompts; Voice Browser Call Control (CCXML), which provides telephony call-control support for VoiceXML and other dialog systems; and Semantic Interpretation for Speech Recognition, which defines links between grammar rules and application semantics so that an application recognizes that two spoken variations of the same element, such as "Coke" and "Coca-Cola," should be treated as the same response.

VoiceXML, already is broadly adopted. It has become a standard scripting language for making Web content accessible via voice and phone - letting users make selections and provide information by talking instead of touching numbers on a keypad.

"VoiceXML allows users to create a description of a dialog between computer and user that can output text, graphics, synthesized speech, digitized audio - and also provide a means to recognize inputs from all these sources," says Ron Schmelzer, a senior analyst at ZapThink LLC. "What makes VoiceXML cool is that you can specify an interface for application functionality that is not Web-based, but specify it in a way that allows Web developers to control how these voice-based application interfaces work."

Scores of vendors have deployed VoiceXML 2.0-compliant applications, products and services, including Hewlett-Packard Co., IBM Corp., Lucent Technologies Inc., Motorola Inc. and Nuance Communications Inc.

Meanwhile, Microsoft is making waves with its Speech Server 2004 speech-recognition platform. Bill Gates, Microsoft's chairman and chief software architect, is scheduled to formally launch the Standard and Enterprise editions at the SpeechTEK conference.

With Speech Server, Visual Studio .Net developers can write applications that recognize spoken commands, convert text to speech and generate spoken prompts by adding code based on XML and Speech Application Language Tags (SALT) technologies to existing Web applications.

SALT is the Microsoft-backed alternative to VoiceXML. It's not nearly as far along in the standards process - currently it is under consideration by the W3C. But it has industry support: Members of the SALT Forum include founding companies Cisco Systems Inc., Comverse Inc., Intel Corp., Microsoft and Koninklijke Philips Electronics NV, along with Compaq and Siemens Enterprise Networks.

Speech Server takes calls and communicates with a Web server via XML and SALT, and makes applications offered online available through the phone, says James Mastan, director of marketing for Microsoft's Speech Server group.

Developers can use Visual Studio to build applications, and Speech Server runs just like any other Microsoft server product, Mastan says. "It's not some black box in a call center that you have to program for in some weird language and you can't maintain yourself because you don't know how it works," he says. Ease of use is at the center of the VoiceXML vs. SALT voice browser specification battle.

Microsoft argues that SALT is easier to use because of its Visual Basic tie-in, whereas VoiceXML requires more telephony-type skills, says Bill Meisel, president of speech-technology research company TMA Associates. There's some validity to that position, he says.

"It's much easier for an IT department that has been focused on a Microsoft development environment to use a SALT solution," Meisel says. "On the other hand, telephone application developers classically have used tool kits that are very specialized for telephony. For them, VoiceXML is a much more natural solution."

Steve Chambers, general manager of ScanSoft Inc.'s network speech division, expects users of the two standards to become divided along familiar lines: .Net converts choosing SALT, and Java shops going with VoiceXML. ScanSoft, whose products include speech-recognition and text-to-speech converters, will support both standards, Chambers says.

Dave Raggett, activity lead for the W3C's voice browser and multimodal working groups, says the two specifications could someday merge. The voice browser working group is focusing on the next major version of VoiceXML, which will incorporate ideas from SALT, among other sources, Raggett says.

Meanwhile, Microsoft's entry will stir the speech-recognition market, but the vendor likely won't become a competitive threat until it releases its second- and third-generation products, says Steve Cramoysan, a principal analyst at Gartner.

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Ann Bednarz

Network World
Show Comments


James Cook University - Master of Data Science Online Course

Learn more >




Back To Business Guide

Click for more ›

Brand Post

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?