The Semantic Web could be the key to unlocking scientific data that's sequestered by disparate applications formats and organizational limitations, and could allow scientists to harness computation's full power, World Wide Web inventor Tim Berners-Lee said Tuesday.
The Semantic Web "will give scientists and other users unexpected help and serendipitous added value from others' data," Berners-Lee, director of the World Wide Web Consortium (W3C), said at the Fourth Annual Bio-IT World Conference and Expo in Boston. The Semantic Web seeks to make it easier for data on the Web to be shared and reused by people and applications.
The Semantic Web is based on the W3C's Resource Description Framework, which uses XML (Extensible Markup Language) to integrate applications. Documents and information in databases on the Semantic Web have to be published in a machine processable form creating a kind of global database.
Life scientists in particular could find the Semantic Web a useful tool, and in so doing, "provide leadership to lots of other fields" in implementing this next-generation Web technology, Berners-Lee said. "At the moment, I see a huge amount of energy from people in life sciences, getting excited by the Semantic Web and what it can do to solve the big-idea problems."
Berners-Lee, who invented key components of the World Wide Web such as HTTP (Hypertext Transfer Protocol) and HTML (Hypertext Markup Language) in the late 1980s, has long envisioned an extension of the organic, unstructured Web. The W3C launched the first projects in the late 1990s, adding metadata to Web pages.
Berners-Lee hopes that life sciences will drive adoption of the Semantic Web, just as high-energy physics drove the early Web.
"Maybe we will meet a critical mass in a certain area. The Web, for example, took off in high-energy physics. When we got six high-energy physics Web sites, then it got interesting for physicists to be onboard," he said. "Similarly, if we could get critical mass in life sciences, if we get a half a dozen or a dozen set of ontologies, the core ones for drug discovery out there, then suddenly the Semantic Web within life sciences would have a critical mass. It'll snowball much more rapidly and it will be copied. Other areas will realize: Oh it's worth investing in this," Berners-Lee said
Life sciences are particularly suitable for pioneering the Semantic Web, Berners-Lee said. For example, within drug discovery, many databases and information systems used by drug researchers are already in, or are ready to be transformed to, machine readable formats.
The Biological Pathways Exchange developing a standard data exchange format for metabolic, signaling, genetic regulatory and genetic pathway information and the Universal Protein Resource (Uniprot) joining information contained in catalogs of information on proteins are two examples.
"In many cases, like Uniprot, the ontology [controlled vocabulary and hierarchical data structure] exists, the modeling has already been done," Berners-Lee said.
Biodash, a Semantic Web prototype of a drug development dashboard, associates diseases, drug progression stages, molecular biology and pathway knowledge for users. A team of representatives from the W3C, IBM Corp., Oracle Corp., University of Colorado and others developed the prototype. It includes a Semantic Web browser connecting information from public sources and chemical libraries with biological entities such as genes, proteins and pathways.
Berners-Lee does not promise a quick return on investment for those formatting their data to suit the Semantic Web and he admits that the concept is "quite difficult to explain." However, he experienced the same problem trying to explain the World Wide Web 15 years ago: "'Hypertext pages; big deal!' people said. They couldn't realize how they would be able to link to potentially anything and what that would mean."
Asked when the Semantic Web will take off, Berners-Lee said: "You tell me. I spend all my energy just telling people what I would like to see happen. What I think will happen is much more dangerous."