IBM ramps up speech technology products and research

In the belief that speech technology is a big, fast-moving, and diverse market, IBM will put its stake in the ground over the next two quarters by giving its portfolio of voice recognition systems a new umbrella term -- Conversational Services.

The company will roll out products that will include speech translation, multimodal interfaces, middleware, natural-language understanding (NLU), text-to-speech, and biometrics.

IBM will soon introduce one of the first products to use visual cues -- such as the movements of the lips and mouth -- to understand the spoken word for speech interpretation, according to Dr. David Nahamoo, senior manager, human language technologies department at IBM.

Nahamoo said the product is already in beta with a number of enterprises and will be available in about two years.

Even longer range, the visual recognition system can be an assist in fixed place environments where gestures can add value. In customer relationship management applications, for example, call centre personnel will understand the unspoken mood of a customer by interpreting body language, Nahamoo said.

"The face is sending a message, happiness, sadness, anger. The challenge is how do you model that and integrate it on top of the other [speech] technologies," Nahamoo said.

In the short term, IBM's visual recognition system -- now in beta -- uses a microphone, a camera to monitor lip and mouth movement, and a set of business rules built into the recognition system.

"It might have a policy that if the face is not looking at the camera, the system understands that the person is not talking to me and so the computer can eliminate the sounds as noise," Nahamoo said.

Also, if the lips are not moving but the system is picking up words or sounds, that information is filtered out as extraneous, Nahamoo said.

Some of these technologies will be especially useful in noisy environments, such as a moving car or on the trading floor of the stock market, noted Nigel Beck, IBMs director of Voice Systems.

"If the vocabulary in the system is small enough it can recognise some words even in noise, and can especially be trained for digits in something as noisy as a 10-decibel environment," Beck said.

The system builds templates in time for each movement of the lips and converts the information into the basic ones and zeroes that a computer understands.

The visual analysis is called a "viseme," not unlike a phoneme, the smallest intelligible segment of sound in a word. A viseme is the smallest intelligible segment of a lip gesture, which when put together with other visemes, allows the system to recognise the movements in aggregate as a word.

In other recent developments, last week, IBM officials displayed a prototype add-on sled that will fit onto the back of a Palm handheld. The speech sled contains a DSP (digital signal processing) chip and memory for translating speech to text, and can be used for executing commands to a contact database or appointments calendar, as well as for voice-activated phone dialling.

Join the PC World newsletter!

Error: Please check your email address.

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Ephraim Schwartz

PC World
Show Comments

Cool Tech

ASUS ROG Swift PG279Q – Reign beyond virtual world

Learn more >

Crucial® BX200 SATA 2.5” 7mm (with 9.5mm adapter) Internal Solid State Drive

Learn more >

D-Link PowerLine AV2 2000 Gigabit Network Kit

Learn more >

Lexar® Professional 1000x microSDHC™/microSDXC™ UHS-II cards

Learn more >

Xiro Drone Xplorer V -3 Axis Gimbal & 1080p Full HD 14MP Camera

Learn more >

D-Link TAIPAN AC3200 Ultra Wi-Fi Modem Router (DSL-4320L)

Learn more >

Gadgets & Things


Learn more >

Lexar® Professional 1000x microSDHC™/microSDXC™ UHS-II cards

Learn more >

Lexar Professional 2000x SDHC™/SDXC™ UHS-II cards

Learn more >

Family Friendly

Lexar Professional 2000x SDHC™/SDXC™ UHS-II cards

Learn more >

Lexar® Professional 1000x microSDHC™/microSDXC™ UHS-II cards

Learn more >

ASUS VivoPC VM62 - Incredibly Powerful, Unbelievably Small

Learn more >

Stocking Stuffer

Lexar Professional 2000x SDHC™/SDXC™ UHS-II cards

Learn more >

Lexar® Professional 1000x microSDHC™/microSDXC™ UHS-II cards

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Best Deals on PC World

Latest News Articles


GGG Evaluation Team

Kathy Cassidy


First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni


For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell


The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi


The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott


My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.


Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?