Researchers dig for hidden links in spam

Two researchers at the University of Quebec are refining a way to identify links advertised in spam messages.

Filtering spam messages is a thankless job for software.

For every 100 spam e-mails, one message usually gets through, an irritating pitch with links to Web sites selling questionable drugs or sketchy Rolexes.

The links contained within spam are one indicator in determining whether it should be blocked. Often after a large spam run, the addresses of spammy Web sites will be added to blocklists that are used by antispam software to cull future messages with those links.

To get around it, spammers construct e-mails with links that can't be identified by filters but still are valid in the messages, said Christopher Fuhrman, a professor of software engineering in the Department of Software and IT Engineering at the University of Quebec.

Spammers do this by "munging" the HTML (Hypertext Markup Language) -- adding backslashes, taking out tags -- so that the message and its links are still readable by the rendering engines of browsers or e-mail clients, but appear as a garble of nonsense to filters. The technique is also known as obfuscation.

It's a trial-and-error process, since spammers don't read HTML Web standards. "Spammers just want to get the cash," Fuhrman said.

Tamper with the HTML too much, and the message won't render at all. Too little, and filters snare the message.

So spammers aim for a narrow gap: Most browsers and e-mail clients can render a certain amount of munged HTML, although the tolerances vary depending on the application.

Fuhrman theorizes that spammers test their messages using Microsoft's widely used Outlook program, which uses the same HTML rendering engine as its Internet Explorer (IE) browser.

So Fuhrman and one of his graduate students, Hicham El Alami, are writing a program to use that IE's rendering engine as a way to "parse" messages, or extract the links.

Services such as SpamCop already do this. SpamCop -- part of IronPort Systems, a subsidiary of Cisco -- has a Web-based service that uses algorithms to parse links out of spam messages submitted by users.

Those algorithms are hard to write, although SpamCop's is pretty good, Fuhrman said. Fuhrman and El Alami are interested in creating an alternate way to do that same parsing without needing to consistently tweak an algorithm to keep up with new tricks used by spammers.

It's hard to write a parser that will read links the same way IE's rendering engine does since Microsoft's source code is secret, Fuhrman said. So a better idea would be just to use that engine as part of a program to parse messages. A variety of tools exist to manipulate IE's rendering engine through APIs (application programming interfaces), Fuhrman said.

The links that IE's engine renders would be reported to a blocklist service. Fuhrman wrote a model version of his idea that works in Java, but El Alami is now working on one for .NET, Microsoft's application development framework.

"I want to ultimately get it as a Web-based engine so that users can paste spam, and when it comes out, it will reveal the links," Fuhrman said.

Join the PC World newsletter!

Error: Please check your email address.

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Jeremy Kirk

IDG News Service
Show Comments

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?