In-memory data management speeds retrieval

My first real Java application, back in 1997, was a servlet-based group scheduler. It wasn't quite the smash hit that Hanson's "MMMBop" was that summer, but as some of you may recall, it had its charms.

One of the things that fascinated me was the ease with which Java enabled me to manage our data in a memory-resident object and serialize it to disk when users made changes to their calendars. The application was, quite simply and elegantly I thought, little more than a Java Dictionary exposed for transactional use on the Web.

Kent Beck and Ward Cunningham, two leaders of the agile programming movement, would have been proud of me. Although I didn't know it at the time, I had embraced one of their central tenets: Do the Simplest Thing That Could Possibly Work.

I hadn't foreclosed any options. There were ways to scale the application if I needed to, and in fact, I later experimented with swapping out Java's native serializer for an industrial-strength object database. But as often turns out to be the case, there was never any need to fire that big cannon.

My group scheduler was an example of what Clay Shirky calls "situated software" -- an application that's used by, at most, dozens of people, and that needs agility more than it needs scalability. I've since revisited that strategy from time to time, most recently for several of the services I use to search my own blog.

In April 2003 I began accumulating all of my entries in a single XML file. I also run them through a publishing system to create Web pages and RSS feeds, but the XML file is my canonical archive. And although I've written more than 700 items since then, amounting to a third of a million words, the file doesn't yet exceed three megabytes.

It's entirely feasible to keep that corpus in memory, so I do. One instance of it backs my structured search service, which I use to run XPath queries over the collection. That gives me instant access to a variety of microformatted elements: quotes by Ward Cunningham, or code snippets in XSLT or Python.

Structured search is handy, but like everyone else I still regard good old-fashioned full text search as my bread and butter. Until recently, I'd been relying on InfoWorld's Ultraseek engine. But because it crawls my site, which includes templated elements, the results aren't very precise. I wanted to search just the words I've written.

So now I load up another instance of the file and search that. The index? There isn't one. The service just rips through memory, finding substrings. It's blindingly fast. And charting my productivity alongside Moore's Law suggests this strategy won't run out of gas anytime soon.

When we consider the exponential growth of storage, we often forget that our most essential data is textual and numeric. And that stuff tends to grow only linearly. For example, my 2005 e-mail archive tops 100 megabytes, but a big chunk of it is PowerPoint attachments people have sent me. Boiled down to their textual and numeric essence, they'd occupy a fraction of the space.

There's nothing new about in-memory databases. They come in many different flavors, all of which are still fairly exotic, but emerging technologies such as Microsoft's LINQ (language integrated query) promise to pull this approach into the mainstream. For our most vital and most volatile data, it's a strategy whose time has come.

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Jon Udell

InfoWorld
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Kurt Hegetschweiler

Brother PocketJet PJ-773 A4 Portable Thermal Printer

It’s perfect for mobile workers. Just take it out — it’s small enough to sit anywhere — turn it on, load a sheet of paper, and start printing.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?