Code search in action

A special-purpose search engine that can parse code can come in handy. Here are three real-world examples.

Programming is hard - always has been, always will be. No matter how experienced you are, or what kind of tools you use, there are always problems that make you gnash your teeth and think about getting an MBA or becoming a manager.

Modern psychology says we shouldn't dwell on negatives, but I'm going to intentionally bring up three gnarly problems that we've all had to deal with. Why? It's not that I'm masochistic. I think there's a search-oriented approach that will reduce the pain, without any medical side effects.

So what are these three problems? Leading off, we've got:

I already fixed this bug once

Don't you just hate that? You fixed the bug, and then it pops up, same mistake, same fix, but in a different area of your code.

Or even worse, you fixed it, and there it is, still alive and creating problems in another branch. Heck, sometimes it's not even a branch--Joe Schmoe decided that it would be a good idea to fork your code for some other project, and you find out about it when somebody figures out you wrote it so you get assigned the bug report.

But often, just like in cheesy horror movies, there's foreshadowing. The premonition that something bad is going to happen. As you fix a bug, there's the little red light blinking in the back of your head, trying to tell you that this bug exists elsewhere. Could be your code, could be somebody else in your group, or in a pile of code you don't even know exists.

Now you can ignore the light. Stand up, grab a cup of coffee, check email, start an IM chat...and eventually the feeling goes away. Heck, maybe the CTO will decide to re-write everything in Ruby before the bug bites back. Kind of like ignoring the odd noise coming from the front-right fender. Most programmers are guys, and guys by definition are masters at ignoring chores.

And spending some extra time to figure out where else the bug might exist is a chore. No question about it. Then, if you find the bug, you have to fix it, or tell somebody else that it needs to be fixed. Who wants to make extra work?

But taking that extra step, going the extra mile, giving 108%--that's what separates real programmers from code monkeys. And it's a great way to build up some good karma points, which always come in handy when you break the build. Plus you can redeem them for valuable chotchkies like bouncy balls with flashing lights inside, like the one I got at ApacheCon from Iona.

So now what, you ask? Well, if propagating bug fixes is going to suck, let's make it suck less. The easiest way to do this is to find clones--exact or almost exact copies of the file that you just modified. Level zero is exact matches, which works for pure clones. Krugle provides a step up from this, by removing comments and stripping out formatting before calculating an MD5 hash, so at least changing tabs into spaces isn't going to make it look like a different file.

Level two would be to use something like a winnowing algorithm to match files that had some minor edits, things like a few new or modified lines. We're not there yet, but getting closer. Level three, aka Bruce Lee on steroids, would be to match up code at a function level, so that if Joe was really being a bad boy, and copied code at the function level, you'd still be able to find it. Interesting techniques for doing this, but they're all pretty darn theoretical. And they wind up being computationally expensive. Or in other words, they're really, really slow.

Back to reality. What about if it's not a clone? How do you find likely suspects for bug replication? Bugs come in many flavors, so here's a short list of techniques for some of the cases I've run into:

1. Assuming the bug involved the wanton misuse of an API, you could search for other places where that same function was called. If you're lucky, and the API uses named constants, then you might be able to easily find examples of the same misuse. Or you want to quickly eye-ball all cases where strcat is called, since your company previously instituted a search-and-destroy approach to unbounded copy APIs like this one, but a few have snuck back in.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Ken Krugler

Show Comments

Brand Post

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Tom Pope

Dynabook Portégé X30L-G

Ultimately this laptop has achieved everything I would hope for in a laptop for work, while fitting that into a form factor and weight that is remarkable.

Tom Sellers


This smart laptop was enjoyable to use and great to work on – creating content was super simple.

Lolita Wang


It really doesn’t get more “gaming laptop” than this.

Jack Jeffries


As the Maserati or BMW of laptops, it would fit perfectly in the hands of a professional needing firepower under the hood, sophistication and class on the surface, and gaming prowess (sports mode if you will) in between.

Taylor Carr


The MSI PS63 is an amazing laptop and I would definitely consider buying one in the future.

Christopher Low

Brother RJ-4230B

This small mobile printer is exactly what I need for invoicing and other jobs such as sending fellow tradesman details or step-by-step instructions that I can easily print off from my phone or the Web.

Featured Content

Product Launch Showcase

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?