Microsoft's custom voice recognition service hits public beta

The service lets developers tailor cloud voice recognition for specific scenarios

Companies building applications that leverage speech recognition have a new machine-learning based tool to improve their work. Microsoft is opening the public beta for its Custom Speech Service, the company said Tuesday.

The service, formerly known as CRIS, allows customers to train a speech recognition system to work in a specific scenario, allowing it to produce more accurate results. For example, the Custom Speech Service can be trained to provide better results in a noisy airport or set up to work better with voices from a particular group, like kids or people with different accents.

Right now, the Custom Speech Service works with English and Chinese, but one of its advantages is that it can be trained to work with accents from non-native speakers.

Microsoft is making it available as part of its suite of Cognitive Services, a set of cloud-based tools aimed at opening up the fruits of the company’s artificial intelligence and machine learning research to the rest of the world.

Right now, there are eight such cognitive services generally available, and an additional 17 in beta. More than 424,000 developers have tried the services since they launched, Microsoft said. Developers all over the world can access the services, many of which are available for purchase through Microsoft Azure.

Each of the services has a free tier with heavy limits on its use, so developers have the freedom to test the APIs out without spending a cent. The Custom Speech Service has a complicated, tiered pricing model that includes a subscription fee along with charges based on the number of voice samples fed into the system and the amount of acoustic adaptation training.

The Custom Speech Service is a key tool in the arsenal of Human Interact, a small game development shop using voice commands as the sole means of interaction for its forthcoming game Starship Commander. Custom speech recognition, along with Microsoft’s Language Understanding Intelligent Service (LUIS), makes up key parts of the voice recognition and understanding system that players use to guide their ship.

The service allows Human Interact to create its own dictionary specific to Starship Commander, which means the system can understand players when they ask about the Ecknians, the game’s alien antagonists. After players' speech has been translated into machine readable text, LUIS processes it and translates it into game commands.

Both systems are important to the core gameplay of Starship Commander. Human Interact set out to make an interactive experience for virtual reality that was broadly accessible to a wide range of players, not just those who have been playing video games for years, creative director Alexander Mejia said.

"The answer was stupidly clear," Mejia said. "What if you just talk to somebody? I mean, if we put a person in front of you, and they start talking to you, would you talk back?"

To that end, the company opted to use the microphones that are built into the Oculus Rift and Gear VR systems and create a game that feels like a much more open-ended and immersive choose-your-own-adventure book.

Microsoft is far from the only company providing machine learning-based cloud voice recognition, but its services were the best for what the team is doing, Mejia said. The services provide what the team needs for not only custom dictionaries, but also fast response times and the ability to see and validate the results that the voice recognition system puts out.

Two other cognitive services from Microsoft will reach general availability next month. The Content Moderator service is designed to automatically block objectionable content in text, videos, and images while allowing for human review of questionable cases. It can detect profanity in more than 100 languages and allows customers to include custom lists of objectionable text as well. 

The Bing Speech API is designed to give developers an easy, generalized way to convert speech to text and vice versa. It supports voice recognition from 18 languages and dialects from 28 countries, including German, French, Chinese, Spanish, and Arabic. Developers can also use the API to do text-to-speech work in 10 languages with support for dialects from 18 countries.  

Microsoft is battling with a number of other cloud companies in this area, including Google, Amazon, and IBM, which each have their own set of machine intelligence-based tools.

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags Microsoft

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Blair Hanley Frank

IDG News Service
Show Comments

Cool Tech

SanDisk MicroSDXC™ for Nintendo® Switch™

Learn more >

Breitling Superocean Heritage Chronographe 44

Learn more >

Toys for Boys

Family Friendly

Panasonic 4K UHD Blu-Ray Player and Full HD Recorder with Netflix - UBT1GL-K

Learn more >

Stocking Stuffer

Razer DeathAdder Expert Ergonomic Gaming Mouse

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?