Accessing the web of databases

OpenLink Software's flagship product is a universal database and application server, Virtuoso.

I convened an interview with Kingsley Idehen, CEO of OpenLink, mainly to discuss Virtuoso's recent transition to open source, but our wide-ranging conversation helped me clarify a theme that's been central to my own work, and will dominate the next phase of the Internet's evolution. The Web is becoming a database -- or, more precisely, a network of databases. All of the trends that inform this column -- including Web services, REST (Representational State Transfer), AJAX (Asynchronous JavaScript and XML), and interpersonal as well as interprocess collaboration -- can be usefully refracted through that lens.

I've always regarded the Web as a programmable data source as well as a platform for the document/software hybrid that we call a Web page. Early on, programmable access to Web data entailed a lot of screen scraping. Nowadays it often still does, but it's becoming common to find APIs that serve up the Web's data. If you want to remix the InfoWorld metadata explorer, for example, as Mike Parsons did, you can fetch its data directly as XML.

Free text search is an even more popular access API. Nearly every site provides that service, or outsources it to Google or another engine.

And, of course, sites that act as database front ends support canned queries, the results of which may (if you're lucky) be accessible by way of APIs such as RSS.

What you can't typically do, though, is create mashups by running ad hoc queries against remote Web data. There are good reasons to think that it's just crazy to export open-ended query interfaces over the Web. No responsible enterprise DBA would permit such access to the crown jewels. But there are all kinds of data sources -- or what Idehen likes to call data spaces -- and a range of feasible and appropriate access modes.

Consider the data space that is my blog. I maintain the data as XML and provide open-ended query access by way of XPath. Want to extract the set of Python code fragments from my corpus? Be my guest, it's just a query on the URL-line. Want to repurpose that data? Go for it -- the output of that query is well-formed XHTML that displays in the browser but is also software-friendly.

If you're clever, you can probably write an XPath query that will stall or crash my service. If you do, one minor node of an emerging network of Web databases will drop off the grid until I notice the problem and restart it. But it won't ruin your day or mine. And as we gain more experience with these modes of access, we'll learn how to make them more resilient to attack.

The holistic view of that network should be our focus. In Idehen's view, you'll use something like SPARQL -- a query language for the semantic Web -- to traverse a graph of interlinked sites, and to merge interesting sources into a virtual collection. Then you'll dispatch queries to each member of that collection. They'll offer a range of query styles ranging from free text search to iteration over simple key/value pairs (accessed by way of RSS or Atom) to tree traversal (XPath, XQuery) and relational query (SQL). I think he's got it exactly right.

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Jon Udell

Show Comments



Sansai 6-Outlet Power Board + 4-Port USB Charging Station

Learn more >



Back To Business Guide

Click for more ›

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?