After boasting a 100-fold productivity improvement in the Australian Securities and Investments Commission's detection of investment scams using their creation, Sydney University researchers are turning their attention to the spam pandemic.
Using advanced document classification technology, the university's chair of language technology at the School of Information Technologies, Professor Jon Patrick, said the notion of semantic engines - although a long time coming - has arrived and this new approach can successfully demonstrate ROI.
"Every [such] problem is the semantic analysis of text, as distinct from wordprocessing," Patrick said. "The best example is a search on Google with a string and wanting it to retrieve documents relevant to the meaning of the word. Our technology goes further than that as it characterizes documents reliably."
The semantic engine takes text from anywhere - including any online or Internet document - and processes it. For example, ASIC uses a general search engine and feeds the results into the semantic engine in its search for outrageous financial scams.
Patrick said the technology is particularly effective when looking for a tiny sample of documents.
"In ASIC's case it had to read 80 documents to find one. Now it can find four by reading five," he said.
It was developed to reduce the input of lawyers, which he said, are the most expensive part of the operation.
"By catching scammers early it is saving the community tens of millions. ASIC can rightly claim a social ROI."
To beat the spam problem, Patrick said the only way to hit it is by building a semantic engine.
"If we classify by meaning we could knock out all Viagra spam quickly," he said. "There are probably 20 to 30 classes of spam topics and below that there are much smaller groups. To avoid false positives our technology analyses the whole document, not just the text."
The technology could be offered at the ISP level or as a hosted service. And the semantic engine improves as is gets feedback.
"The semantic engine has to be customized for use as it uses a machine learning technology tuned for the semantic space you want it to search in," he said. "It's not a piece of shrink-wrapped software and from beginning to end would be a six- to nine-month [project]."
After the success of the ASIC project, Patrick is looking for companies interested in an antispam trial and is also considering commercializing the "very valuable technology" by creating a spin-off company".
Patrick is also confident that development of the semantic engine successfully demonstrated how to get research and development "to sleep together in one bed".
"We have achieved the same objective," he said, adding that an insurance company could use the technology to survey accident reports to detect more fraud, pay out less, and reduce insurance costs.