Speech vendors shout for standards

The battle for speech technology standards is set to escalate next week when a collection of industry leaders submits to the World Wide Web Consortium (W3C) a proposed framework for delivering combined graphics and speech on handheld devices.

The VoiceXML Forum, headed by IBM Corp., Nuance Communicaitons Inc., Oracle Corp., and Lucent Technologies Inc. will announce a proposal for a multimodal technology standard at the Telephony Voice User Interface Conference, in Scottsdale, Arizona.

Meanwhile, Microsoft Corp. will counter with its own news, using the same conference to announce the addition of another major speech vendor to its SALT (Speech Application Language Tags) Forum. The as yet unnamed vendor intends to rewrite its components to work with Microsoft's speech platform.

The announcement will follow the addition of 18 new members to the SALT Forum, a proposed alternative to VXML's multimodal solution.

New members of the SALT Forum include Compaq Computer Corp. and Siemens Enterprise Networks LLC. Founding members include Cisco Systems Inc., Comverse Technology Inc., Intel Corp., Microsoft, Koninklijke Philips Electronics NV, and SpeechWorks International Inc.

Beyond the issue over which industry-backed consortium has the best multimodal solution, larger issues are at play according industry observers, some of whom preferred not to go on record for fear antagonizing Microsoft, a powerful partner.

"The Microsoft partnership announcement is about a major speech technology company redoing their technology for Microsoft's .Net strategy," said one industry insider who asked not to be attributed.

Microsoft's speech platform will encourage developers to be .Net compliant by tightly connecting its SAPI (speech application programming interface) Version 5.1, with SALT and Visual Studio. Currently, only Microsoft's own speech engine is SAPI 5.1-compliant. Ultimately Microsoft appears to be making speech a greater part of its .Net plans.

"The SALT component in Visual Studio is in progress. The alpha is out, beta by middle of the year," said James Mastan, group product manager at Microsoft for its .Net Speech Technologies in Redmond, Wash.

As Microsoft upgrades SAPI to be SALT-compliant, independent software developers wanting to create applications for the Microsoft platform will also have be SAPI 5.1- and SALT-compliant as well.

However, SALT is still in its early stages. The first proposed specification is not expected until later this year. Most mainstream speech developers are currently creating Voice XML speech applications built on Java and the J2EE (Java 2 Enterprise Edition) environment, and running on BEA, IBM, Oracle, and Sun application servers.

This week General Magic and InterVoice-Brite announced a partnership to develop Interactive Voice Recognition (IVR) enterprise solutions for "J2EE environments," using General Magic's VXML technology.

"There is a whole infrastructure being created on J2EE and IBM's WebSphere, BEA, open source J2EE Web servers like Jboss/Tomcat, and on Solaris," said Bill Meisel, president of TMA Associates in Tarzana, Calif.

Using J2EE Web servers, developers can deliver what is called "VXML dynamically."

The power of dynamic VXML on a J2EE platform its ability to access a company's existing database using voice control and deliver responses customized to the specific caller.

For example, if a sales person called and used voice to access customer files, that event would trigger access to customers only in that salesperson's territory. A business customer ordering office supplies might be identified by the phone number and a set of customized voice prompts and answers would be generated based on past orders.

Until recently Microsoft offered only a simple set of SAPI (speech APIs). Now through acquisition and internal development it has its own powerful speech engine which it is giving away to developers royalty free, said Peter Mcgregor, an independent software vendor creating speech products.

Microsoft redeveloped SAPI in Version 5.1 to run on its new speech engine, while simultaneously proposing SALT as an alternative to VXML. Wrapping it all up in a marketing context, Microsoft's Mastan called the company's collection of speech technologies a "platform," a term previously not used.

He indicated the next step may be to offer Microsoft's speech platform as part of its application server.

"We haven't decided on our go to market configuration but you can think of it as a server that we would sell like any other .Net component. It's a server with a bundled set of components made available to build applications on top of these components," Mastan said.

He would not say if Microsoft's speech server would be bundled free into its application server in the same way as its Mobile Information Server.

However, becoming Microsoft speech platform-compliant may be a small price to pay, according to one independent software vendor because of what he gets in return.

"I don't have to pay royalties for the Microsoft engine which can save me as much as US$6 to $7 per package in fees," said Mcgregor, a speech ISV, currently using the Microsoft platform for development.

"Microsoft finally has a good engine. As good as Dragon and IBM," Mcgregor said.

Meanwhile, the competing speech engines from the likes of IBM and Nuance are not Microsoft SAPI 5.1-compliant. If you want the free engine you need to be compliant with 5.1 and SALT, notes the developer.

"Everybody has to play catch up with Microsoft," Mcgregor said.

The issue over which specification of SALT, not due to be released until sometime later this year, or VXML, whose Version 2 is now out for review, is better is an argument that can only be determined by developers. Each side claims the other's specifications are deficient.

Microsoft's X.D. Huang, general manager, Microsoft .Net Speech Technologies said that no matter what the VXML Forum claims, VXML will never be a good platform for multimodal speech.

"VXML is just not technically good enough and it doesn't matter what you do. You can beat a dead horse for a long time but no matter how you beat it is still dead," Huang said.

IBM's William S "Ozzie" Osborne, general manager of IBM Voice Systems in Somers, N.Y., has a different point of view.

"I hope that we get to one standard. Multiple standards fragment the market place and create a diversion. I would like to see us get to a standard that is industry wide and not proprietary. What we are proposing to the W3C, using VXML for speech and x-HTML for graphics in a single program, is cheaper and easier than SALT without having to have the industry redo everything they have done," Osborne said.

However, according to conference organizer, TMA Associates' Miesel, the debate could be overstated.

"Neither SALT nor VXML turns a developer into a speech programmer," Miesel said.

Join the PC World newsletter!

Error: Please check your email address.

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Ephraim Schwartz

Computerworld
Show Comments

Essentials

Microsoft L5V-00027 Sculpt Ergonomic Keyboard Desktop

Learn more >

Lexar® JumpDrive® S57 USB 3.0 flash drive

Learn more >

Mobile

Lexar® JumpDrive® S45 USB 3.0 flash drive 

Learn more >

Exec

HD Pan/Tilt Wi-Fi Camera with Night Vision NC450

Learn more >

Lexar® Professional 1800x microSDHC™/microSDXC™ UHS-II cards 

Learn more >

Lexar® JumpDrive® C20c USB Type-C flash drive 

Learn more >

Audio-Technica ATH-ANC70 Noise Cancelling Headphones

Learn more >

Budget

Back To Business Guide

Click for more ›

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Kathy Cassidy

STYLISTIC Q702

First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni

STYLISTIC Q572

For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?