In-memory data management speeds retrieval

My first real Java application, back in 1997, was a servlet-based group scheduler. It wasn't quite the smash hit that Hanson's "MMMBop" was that summer, but as some of you may recall, it had its charms.

One of the things that fascinated me was the ease with which Java enabled me to manage our data in a memory-resident object and serialize it to disk when users made changes to their calendars. The application was, quite simply and elegantly I thought, little more than a Java Dictionary exposed for transactional use on the Web.

Kent Beck and Ward Cunningham, two leaders of the agile programming movement, would have been proud of me. Although I didn't know it at the time, I had embraced one of their central tenets: Do the Simplest Thing That Could Possibly Work.

I hadn't foreclosed any options. There were ways to scale the application if I needed to, and in fact, I later experimented with swapping out Java's native serializer for an industrial-strength object database. But as often turns out to be the case, there was never any need to fire that big cannon.

My group scheduler was an example of what Clay Shirky calls "situated software" -- an application that's used by, at most, dozens of people, and that needs agility more than it needs scalability. I've since revisited that strategy from time to time, most recently for several of the services I use to search my own blog.

In April 2003 I began accumulating all of my entries in a single XML file. I also run them through a publishing system to create Web pages and RSS feeds, but the XML file is my canonical archive. And although I've written more than 700 items since then, amounting to a third of a million words, the file doesn't yet exceed three megabytes.

It's entirely feasible to keep that corpus in memory, so I do. One instance of it backs my structured search service, which I use to run XPath queries over the collection. That gives me instant access to a variety of microformatted elements: quotes by Ward Cunningham, or code snippets in XSLT or Python.

Structured search is handy, but like everyone else I still regard good old-fashioned full text search as my bread and butter. Until recently, I'd been relying on InfoWorld's Ultraseek engine. But because it crawls my site, which includes templated elements, the results aren't very precise. I wanted to search just the words I've written.

So now I load up another instance of the file and search that. The index? There isn't one. The service just rips through memory, finding substrings. It's blindingly fast. And charting my productivity alongside Moore's Law suggests this strategy won't run out of gas anytime soon.

When we consider the exponential growth of storage, we often forget that our most essential data is textual and numeric. And that stuff tends to grow only linearly. For example, my 2005 e-mail archive tops 100 megabytes, but a big chunk of it is PowerPoint attachments people have sent me. Boiled down to their textual and numeric essence, they'd occupy a fraction of the space.

There's nothing new about in-memory databases. They come in many different flavors, all of which are still fairly exotic, but emerging technologies such as Microsoft's LINQ (language integrated query) promise to pull this approach into the mainstream. For our most vital and most volatile data, it's a strategy whose time has come.

Join the newsletter!

Or
Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Jon Udell

InfoWorld
Show Comments

Brand Post

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Andrew Teoh

Brother MFC-L9570CDW Multifunction Printer

Touch screen visibility and operation was great and easy to navigate. Each menu and sub-menu was in an understandable order and category

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?