New speech technologies making noise

A key standard for building speech-based telephony applications, VoiceXML 2.0, has received a final nod of approval from the World Wide Web Consortium.

The standard's official graduation comes just days before Microsoft is expected to formally launch its Speech Server products - which adhere to a competing standards effort - at the SpeechTEK conference in San Francisco.

The W3C advanced VoiceXML 2.0, along with the supporting Speech Recognition Grammar Specification (SRGS), to final "recommendation" status, effectively making them Web standards. These are the most mature of a handful of specifications in the W3C's evolving Speech Interface Framework.

The Speech Interface Framework aims to define a set of standards for building applications that let people interact with Web-based services over a telephone. The applications use a variety of voice-based interfaces that range from keypads and spoken commands to music and synthetic speech. Within the framework, VoiceXML controls how a voice application interacts with a user. Developers use SRGS to describe the words and phrases that end users are expected to give in response to spoken prompts.

Other elements of the framework include Speech Synthesis Markup Language (SSML), which is used for creating spoken prompts; Voice Browser Call Control (CCXML), which provides telephony call-control support for VoiceXML and other dialog systems; and Semantic Interpretation for Speech Recognition, which defines links between grammar rules and application semantics so that an application recognizes that two spoken variations of the same element, such as "Coke" and "Coca-Cola," should be treated as the same response.

VoiceXML, already is broadly adopted. It has become a standard scripting language for making Web content accessible via voice and phone - letting users make selections and provide information by talking instead of touching numbers on a keypad.

"VoiceXML allows users to create a description of a dialog between computer and user that can output text, graphics, synthesized speech, digitized audio - and also provide a means to recognize inputs from all these sources," says Ron Schmelzer, a senior analyst at ZapThink LLC. "What makes VoiceXML cool is that you can specify an interface for application functionality that is not Web-based, but specify it in a way that allows Web developers to control how these voice-based application interfaces work."

Scores of vendors have deployed VoiceXML 2.0-compliant applications, products and services, including Hewlett-Packard Co., IBM Corp., Lucent Technologies Inc., Motorola Inc. and Nuance Communications Inc.

Meanwhile, Microsoft is making waves with its Speech Server 2004 speech-recognition platform. Bill Gates, Microsoft's chairman and chief software architect, is scheduled to formally launch the Standard and Enterprise editions at the SpeechTEK conference.

With Speech Server, Visual Studio .Net developers can write applications that recognize spoken commands, convert text to speech and generate spoken prompts by adding code based on XML and Speech Application Language Tags (SALT) technologies to existing Web applications.

SALT is the Microsoft-backed alternative to VoiceXML. It's not nearly as far along in the standards process - currently it is under consideration by the W3C. But it has industry support: Members of the SALT Forum include founding companies Cisco Systems Inc., Comverse Inc., Intel Corp., Microsoft and Koninklijke Philips Electronics NV, along with Compaq and Siemens Enterprise Networks.

Speech Server takes calls and communicates with a Web server via XML and SALT, and makes applications offered online available through the phone, says James Mastan, director of marketing for Microsoft's Speech Server group.

Developers can use Visual Studio to build applications, and Speech Server runs just like any other Microsoft server product, Mastan says. "It's not some black box in a call center that you have to program for in some weird language and you can't maintain yourself because you don't know how it works," he says. Ease of use is at the center of the VoiceXML vs. SALT voice browser specification battle.

Microsoft argues that SALT is easier to use because of its Visual Basic tie-in, whereas VoiceXML requires more telephony-type skills, says Bill Meisel, president of speech-technology research company TMA Associates. There's some validity to that position, he says.

"It's much easier for an IT department that has been focused on a Microsoft development environment to use a SALT solution," Meisel says. "On the other hand, telephone application developers classically have used tool kits that are very specialized for telephony. For them, VoiceXML is a much more natural solution."

Steve Chambers, general manager of ScanSoft Inc.'s network speech division, expects users of the two standards to become divided along familiar lines: .Net converts choosing SALT, and Java shops going with VoiceXML. ScanSoft, whose products include speech-recognition and text-to-speech converters, will support both standards, Chambers says.

Dave Raggett, activity lead for the W3C's voice browser and multimodal working groups, says the two specifications could someday merge. The voice browser working group is focusing on the next major version of VoiceXML, which will incorporate ideas from SALT, among other sources, Raggett says.

Meanwhile, Microsoft's entry will stir the speech-recognition market, but the vendor likely won't become a competitive threat until it releases its second- and third-generation products, says Steve Cramoysan, a principal analyst at Gartner.

Join the PC World newsletter!

Error: Please check your email address.

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Ann Bednarz

Network World
Show Comments

Most Popular Reviews

Latest News Articles


PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?