In-memory data management speeds retrieval

My first real Java application, back in 1997, was a servlet-based group scheduler. It wasn't quite the smash hit that Hanson's "MMMBop" was that summer, but as some of you may recall, it had its charms.

One of the things that fascinated me was the ease with which Java enabled me to manage our data in a memory-resident object and serialize it to disk when users made changes to their calendars. The application was, quite simply and elegantly I thought, little more than a Java Dictionary exposed for transactional use on the Web.

Kent Beck and Ward Cunningham, two leaders of the agile programming movement, would have been proud of me. Although I didn't know it at the time, I had embraced one of their central tenets: Do the Simplest Thing That Could Possibly Work.

I hadn't foreclosed any options. There were ways to scale the application if I needed to, and in fact, I later experimented with swapping out Java's native serializer for an industrial-strength object database. But as often turns out to be the case, there was never any need to fire that big cannon.

My group scheduler was an example of what Clay Shirky calls "situated software" -- an application that's used by, at most, dozens of people, and that needs agility more than it needs scalability. I've since revisited that strategy from time to time, most recently for several of the services I use to search my own blog.

In April 2003 I began accumulating all of my entries in a single XML file. I also run them through a publishing system to create Web pages and RSS feeds, but the XML file is my canonical archive. And although I've written more than 700 items since then, amounting to a third of a million words, the file doesn't yet exceed three megabytes.

It's entirely feasible to keep that corpus in memory, so I do. One instance of it backs my structured search service, which I use to run XPath queries over the collection. That gives me instant access to a variety of microformatted elements: quotes by Ward Cunningham, or code snippets in XSLT or Python.

Structured search is handy, but like everyone else I still regard good old-fashioned full text search as my bread and butter. Until recently, I'd been relying on InfoWorld's Ultraseek engine. But because it crawls my site, which includes templated elements, the results aren't very precise. I wanted to search just the words I've written.

So now I load up another instance of the file and search that. The index? There isn't one. The service just rips through memory, finding substrings. It's blindingly fast. And charting my productivity alongside Moore's Law suggests this strategy won't run out of gas anytime soon.

When we consider the exponential growth of storage, we often forget that our most essential data is textual and numeric. And that stuff tends to grow only linearly. For example, my 2005 e-mail archive tops 100 megabytes, but a big chunk of it is PowerPoint attachments people have sent me. Boiled down to their textual and numeric essence, they'd occupy a fraction of the space.

There's nothing new about in-memory databases. They come in many different flavors, all of which are still fairly exotic, but emerging technologies such as Microsoft's LINQ (language integrated query) promise to pull this approach into the mainstream. For our most vital and most volatile data, it's a strategy whose time has come.

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Jon Udell

InfoWorld
Show Comments

Brand Post

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Luke Hill

MSI GT75 TITAN

I need power and lots of it. As a Front End Web developer anything less just won’t cut it which is why the MSI GT75 is an outstanding laptop for me. It’s a sleek and futuristic looking, high quality, beast that has a touch of sci-fi flare about it.

Emily Tyson

MSI GE63 Raider

If you’re looking to invest in your next work horse laptop for work or home use, you can’t go wrong with the MSI GE63.

Laura Johnston

MSI GS65 Stealth Thin

If you can afford the price tag, it is well worth the money. It out performs any other laptop I have tried for gaming, and the transportable design and incredible display also make it ideal for work.

Andrew Teoh

Brother MFC-L9570CDW Multifunction Printer

Touch screen visibility and operation was great and easy to navigate. Each menu and sub-menu was in an understandable order and category

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?