EU helps machine translation with one million sentences

The EU is offering translation software developers free access to one million sentences translated between 22 of the European Union's 23 official languages

The European Commission is offering translation software developers free access to around one million sentences translated between 22 of the European Union's 23 official languages. It hopes the data will help improve the quality of a variety of language tools, including grammar and spelling checkers, online dictionaries and machine translators -- particularly in less well-served languages such as Latvian or Romanian.

The sentences are mostly drawn from the "Acquis Communautaire," the body of law that must be implemented by all new E.U. member states, and include the treaties, directives and regulations adopted by the E.U., and rulings from the European Court of Justice.

Translated by professional translators, they cover topics such as IT, telecommunications, labor law, agriculture and fishing.

The translations form part of the "translation memory" used by the Commission's permanent staff of 1,750 translators, and are matched up, sentence by sentence, in each of the 22 languages, and are tagged with subject classifications.

The matching and tagging makes the sentences especially useful for developers of statistical machine translation software, who must amass a corpus of thousands of matched sentences in the languages between which they wish to translate, so that they can calculate the most likely translation for any given expression. Since the matching of sentences has already been done, they will save time -- and the immense size of the Acquis Communautaire will help them make their calculations more accurate.

Until now, developers have typically resorted to scouring the Web for texts translated into several languages, and using other software tools to make a guess at where sentences start and end in order to match them up.

While the release of the data will benefit software developers, the Commission is not being entirely altruistic: it hopes that the availability of better, cheaper automated translation software will help speakers of the E.U.'s minority languages by giving them access to online information currently available only in the more widely spoken languages.

Interested developers can download the texts from the Web site of the Commission's Directorate General of Translation. They will also need the text extraction program and its library.

Join the PC World newsletter!

Error: Please check your email address.

Struggling for Christmas presents this year? Check out our Christmas Gift Guide for some top tech suggestions and more.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Peter Sayer

IDG News Service

Most Popular Reviews

Follow Us

Best Deals on GoodGearGuide

Shopping.com

Latest News Articles

Resources

GGG Evaluation Team

Kathy Cassidy

STYLISTIC Q702

First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni

STYLISTIC Q572

For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell

LIFEBOOK UH574

The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi

STYLISTIC Q702

The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott

STYLISTIC Q702

My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.

Latest Jobs

Shopping.com

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?