Code search in action

A special-purpose search engine that can parse code can come in handy. Here are three real-world examples.

Programming is hard - always has been, always will be. No matter how experienced you are, or what kind of tools you use, there are always problems that make you gnash your teeth and think about getting an MBA or becoming a manager.

Modern psychology says we shouldn't dwell on negatives, but I'm going to intentionally bring up three gnarly problems that we've all had to deal with. Why? It's not that I'm masochistic. I think there's a search-oriented approach that will reduce the pain, without any medical side effects.

So what are these three problems? Leading off, we've got:

I already fixed this bug once

Don't you just hate that? You fixed the bug, and then it pops up, same mistake, same fix, but in a different area of your code.

Or even worse, you fixed it, and there it is, still alive and creating problems in another branch. Heck, sometimes it's not even a branch--Joe Schmoe decided that it would be a good idea to fork your code for some other project, and you find out about it when somebody figures out you wrote it so you get assigned the bug report.

But often, just like in cheesy horror movies, there's foreshadowing. The premonition that something bad is going to happen. As you fix a bug, there's the little red light blinking in the back of your head, trying to tell you that this bug exists elsewhere. Could be your code, could be somebody else in your group, or in a pile of code you don't even know exists.

Now you can ignore the light. Stand up, grab a cup of coffee, check email, start an IM chat...and eventually the feeling goes away. Heck, maybe the CTO will decide to re-write everything in Ruby before the bug bites back. Kind of like ignoring the odd noise coming from the front-right fender. Most programmers are guys, and guys by definition are masters at ignoring chores.

And spending some extra time to figure out where else the bug might exist is a chore. No question about it. Then, if you find the bug, you have to fix it, or tell somebody else that it needs to be fixed. Who wants to make extra work?

But taking that extra step, going the extra mile, giving 108%--that's what separates real programmers from code monkeys. And it's a great way to build up some good karma points, which always come in handy when you break the build. Plus you can redeem them for valuable chotchkies like bouncy balls with flashing lights inside, like the one I got at ApacheCon from Iona.

So now what, you ask? Well, if propagating bug fixes is going to suck, let's make it suck less. The easiest way to do this is to find clones--exact or almost exact copies of the file that you just modified. Level zero is exact matches, which works for pure clones. Krugle provides a step up from this, by removing comments and stripping out formatting before calculating an MD5 hash, so at least changing tabs into spaces isn't going to make it look like a different file.

Level two would be to use something like a winnowing algorithm to match files that had some minor edits, things like a few new or modified lines. We're not there yet, but getting closer. Level three, aka Bruce Lee on steroids, would be to match up code at a function level, so that if Joe was really being a bad boy, and copied code at the function level, you'd still be able to find it. Interesting techniques for doing this, but they're all pretty darn theoretical. And they wind up being computationally expensive. Or in other words, they're really, really slow.

Back to reality. What about if it's not a clone? How do you find likely suspects for bug replication? Bugs come in many flavors, so here's a short list of techniques for some of the cases I've run into:

1. Assuming the bug involved the wanton misuse of an API, you could search for other places where that same function was called. If you're lucky, and the API uses named constants, then you might be able to easily find examples of the same misuse. Or you want to quickly eye-ball all cases where strcat is called, since your company previously instituted a search-and-destroy approach to unbounded copy APIs like this one, but a few have snuck back in.

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Ken Krugler

LinuxWorld
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Kurt Hegetschweiler

Brother PocketJet PJ-773 A4 Portable Thermal Printer

It’s perfect for mobile workers. Just take it out — it’s small enough to sit anywhere — turn it on, load a sheet of paper, and start printing.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?