Microsoft bolsters artificial intelligence with additions to Project Oxford

Developers will get access to advanced tools for face recognition, speech recognition and more

Microsoft's Project Oxford, a suite of developer tools based on the company's machine learning and artificial intelligence research, is getting a new quintet of services, the company announced at its Future Decoded conference in London. 

Developers can now take advantage of an emotion detection service that looks at a photo and lists an array of emotions that it detects on the subjects' faces. For each person in an image (up to a certain number), the service will pass back the probabilities that someone is expressing anger, happiness, fear, surprise, disgust, sadness, contempt or nothing at all.

According to Ryan Galgon, a senior program manager at Microsoft, the company built the service after it saw developers using Project Oxford's existing face detection technology in applications that run sentiment analysis on photographs. The new service makes different applications possible, like editing photos based on the feelings of the people in them.  

That's not to say Microsoft has handed developers an emotion-detection expert in a box. The service can only handle static images at this point, and Galgon said Microsoft is more confident in some of its emotion detection models (like finding happiness) than in others (contempt and disgust). 

Speaking of video, by the end of this year Project Oxford will have beta support for video tools including motion detection and image stabilization. The suite will also have face-tracking tools that will log where people are in each frame of a video so users can analyze what's going on.  

Depending on the size of the video, it could take a while to process a file with one of those services. Microsoft has placed a cap on how big the video files fed into the service can be, and Galgon suggests that developers scale down the resolution of large files. 

In voice, Microsoft announced Custom Recognition Intelligent Services (CRIS), which lets developers create voice-recognition models for specific circumstances. It's useful for taking dictation that a traditional model wouldn't be well suited to, like the speech of young kids or interactions with a kiosk at a baseball park.

To get more personalized results, developers have to feed the service a set of audio files, along with transcriptions of the speech, to build up the speech model.

There's also a forthcoming speaker recognition feature in Project Oxford's speech toolset. Right now, it's not really built to do something like analyze a recorded conversation and pick out who's speaking when. But it does let developers take a short clip of someone talking and determine whether the person speaking matches the person it's been trained to recognize. 

It's the sort of thing Galgon sees working as a lightweight form of authentication: Not as secure as a password or fingerprint but useful as one signal to see if someone is who they say they are. 

Put together, speaker recognition and face detection could be used as part of the foundation of a security system similar to Google's Project Abacus, which authenticates a user based on a variety of signals including voice and facial recognition. Abacus is still in development.

All the processing for these services is handled in Microsoft's cloud, which means applications using Project Oxford APIs have to be connected to the Internet. Galgon said the company has heard from plenty of developers asking to use Project Oxford's capabilities offline but Microsoft wants to keep them online-only for now.

Releasing the tools to the public could help attract users to the company's Azure cloud platform, which features three Project Oxford services as part of the Cortana Analytics Suite. Developers can try out all features for free and talk to Microsoft if they need to use more than what's available through Project Oxford's free usage tier. 

Join the PC World newsletter!

Error: Please check your email address.

Tags Microsoft

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Blair Hanley Frank

IDG News Service
Show Comments

Most Popular Reviews

Latest News Articles


GGG Evaluation Team

Kathy Cassidy


First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni


For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell


The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi


The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott


My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?