Blog spammers have found ways to automate inserting their unwanted messages into online conversations, but the few tools available to block them lag woefully behind.
"How far ahead [of us] are the spammers? Who knows," says Jessica Baumgart, an affiliate with Harvard University's Berkman Center for Internet and Society, who gave a presentation on blog spam at the MIT Spam Conference 2007 held in Cambridge Friday. "Any time we try to block them out, they find a way to get in. We'll do something and five minutes later they're back. It's like playing chess."
According to Baumgart, who has been involved with Harvard's blogging initiative for seven years and manages tens of blogs on seven different platforms, there are three main ways spammers get their messages into blogs:
- Comment spam: spammers are paid to surf the Web in search of blogs to manually type comments into, or write scripts to automatically enter the text. These can be hard to distinguish from legitimate entries, Baumgart says, except they're often off the topic of the blog and include a link to a Web site.
- Trackback spam: spammers develop scripts that use trackback links to place spam on blogs. A blog's trackback feature lets readers automatically notify a site that they have linked to its pages. Trackback spam are links to random Web sites, many of which "are things you don't necessarily want to see" as the blog host or participant, Baumgart says.
- Spam blogs, or splogs: Spammers take advantage of services like Blogspot to set up free blogs that exist only to point visitors to Web sites. Not only are these sites annoying to visitors looking for legitimate information on a topic, Baumgart says, but they also pollute the results of search engines that index the sites.
There are some tools available to help blog hosts combat this unwanted, unrelated input. Certain blog platforms include administration tools to block certain IP addresses from adding comments, although Baumgart adds spammers tend to use a range of IP addresses so blocking them one-by-one can become unfeasible. There's also the no-follow link option, which is a command that can be embedded in HTML code that tells search engines indexing a blog not to consider a link legitimate, she says.
What would be helpful would be the use of image recognition -- often used on Web sites to verify the visitor is a human and not a computer by asking them to recognize a word embedded in an image and type it in -- but because some of the comment spam is generated by humans they could easily pass this test, Baumgart says.
Blog administrator tools today aren't enough ammunition to fight off the mounting spam problem, leaving blog hosts helpless. One organization that Baumgart didn't want to name has become so inundated with blog spam that it plans to pull the plug on its blog server and start over, losing forever all of the legitimate entries along with the spam. "The best thing to do is to shut down the server and just get rid of it," she says.