Imagine being able to communicate via instant messaging with people who don't speak your language, or translate a foreign-language news bulletin automatically when you open it in your browser.
These are just two of the applications envisioned by promoters of the Worldwide Lexicon, an all-volunteer project founded by Brian McConnell. His concept: a distributed computing architecture drawing on nodes of participating PCs--and people--around the globe. At its heart is a simple protocol that links Web-based dictionaries, encyclopedias, and translation servers. It can even query human translators via a Gnutella-like peer-to-peer network.
"The Internet has eliminated physical boundaries, but there are still language boundaries," McConnell says, describing the project at the recent O'Reilly & Associates Inc. Emerging Technologies conference. He proposes the Worldwide Lexicon as a simple, effective way to provide access to the wealth of language resources on the Web. Potential volunteers can inquire at the site, and the first components are expected to go live this summer. As WWL project leader, even McConnell is a volunteer; he is a cofounder and developer at Trekmail.com, a dictation service that enables customers to send e-mail by telephone.
The truly innovative aspect of the project is its experiment in "distributed human computing," McConnell says. Just as the SETI@Home project taps the spare processing power of clients to support its search for extraterrestrial life, the WWL proposes to aggregate the spare language skills of people linked to the Internet. For example, someone proficient in English, Spanish, and Arabic would install a WWL client that senses when the person is available for a language query (based on criteria the volunteer sets). When a WWL server sends a translation request to an appropriate client, the human translator can choose to attempt a translation or to ignore the request.
The dictionary servers collect these translations, compare them with other responses, and determine whether to add the term to their databases. Over time, the translator volunteers would be assigned a kind of reliability rating that would give their responses more or less weight based on past accuracy.
The last piece of the puzzle is development of applications that let people access WWL servers for language translation in near-real time, according to McConnell and others on the project. By embedding hooks to WWL in instant messaging clients, for example, people could request automatic machine translations of incoming and outgoing messages. They could also request a translation of slang, metaphors, or other words not in the dictionary, by one of the volunteer client translators.
Translation on Demand
McConnell foresees the WWL will eventually automatically translate news reports and other topical information. Client translators could volunteer to work on small parts of long documents in a translation process called "segmentation." The separate translations are recombined automatically and output in HTML.
As a distributed-computing initiative, the WWL uses a small number of Simple Object Access Protocol (SOAP) methods to provide an interface for creating the required client and server applications. Upgrading a dictionary or translation server to support WWL requires only two simple scripts that respond to SOAP messages rather than CGI: One links to the so-called supernodes that list all available servers and clients for specific languages; and another responds to client translation requests.
McConnell expects the first supernodes will be running by the end of June. Beta testing of the client components will begin soon after, and the WWL client is scheduled to be widely distributed in the second half of 2002. The first WWL applications will likely be direct links to online translation services such as Babelfish, but McConnell hopes to soon enhance these sites with word-by-word translation functions through links to dictionary servers and volunteer human translators.
While assembling all the WWL components and recruiting sufficient translator volunteers would appear to be a daunting task, McConnell says he expects the project to succeed because it's a distributed environment, so computing tasks are spread and shared among application developers and translation service providers. The components include the directory servers (supernodes), the dictionary servers, the gateway servers that translate incoming SOAP/WWL requests, the translator clients, and the clients using the translation services.
Because each component is individually useful, the project can grow gradually, McConnell says. But as is the case with most open-source and peer-to-peer projects, WWL's ultimate success depends on what McConnell calls "a community of users committed to the project."