Social context for data analysis

How data analysis could work productively on the Web

I'm a huge fan of the CAPStat (formerly DCStat) program, but despite my cheerleading, the hoped-for citizen-led mashups haven't yet materialized in a big way.

In principle, the data is there for the taking, and there's an open invitation for anyone to scoop it up and do useful analysis. In practice, only half the battle is won -- thanks to the immediate availability of data represented as RSS, Atom, and the district's own, richer flavor of XML. It's great to lay your hands on the data, but as Bob Glushko rightly insists on reminding me, XML only seems to be a self-describing format. What do tags or field names really mean? Which elements or fields are or are not comparable? We can only answer these questions by pointing to instances of data (records, documents), discussing them, and coming to agreements.

Lately, I'm seeing some intriguing glimpses of how that process could work productively on the Web. One stunning example Dabble DB , which enables you to pluck data right from the surface of a Web page and inject it into a shareable Web database. Once it's there, the whole panoply of Web-2.0-style techniques -- linking, tagging, blogging -- can support a loosely coupled conversation about the provenance and the semantics of the data.

Today I found another piece of the puzzle -- a new site called Swivel . It's done in the standard Web 2.0 style, complete with regulation Flickr-blue search buttons and Ruby on Rails URL syntax. To tell you the truth, I'm not sure how useful it'll turn out to be. But the idea at the core of Swivel -- inviting people to publish, annotate, and share datasets -- is spot on.

As a first experiment, I grabbed the CAPStat reported-crime feed for November, sucked it into Excel 2003, consolidated incidents by day, pivoted them on type of offense (homicide, burglary), and exported them back out as a CSV (comma-separated value) file that Swivel could import. The service immediately produced a chart for each of the nine crime types in my data set. Eventually the site will "swivel" my data, a process of further analysis that it assures me will be "worth the wait." I dunno, maybe -- I'm not holding my breath. Poking around, I haven't found any breathtaking examples of mechanical insight.

But there's something a lot simpler, yet I think also a lot more useful, going on here. The charts are fun to look at, but it's the data (and the source attributions) that really matter. When it's parked in the cloud, other people can find it by way of search terms ('washington,' 'burglary,' 'arson,' 'dcstat'). And whether Dabble DB massages it online or Excel does so locally, they can gather around a common URL to discuss how to use and interpret the data.

Data analysis is an inherently social act. Until now it has lacked an appropriate social context. But that's going to change -- and soon, I hope.

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Jon Udell

Show Comments


James Cook University - Master of Data Science Online Course

Learn more >


Victorinox Werks Professional Executive 17 Laptop Case

Learn more >

Sansai 6-Outlet Power Board + 4-Port USB Charging Station

Learn more >



Back To Business Guide

Click for more ›

Brand Post

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?