The Australian Securities and Investment Commission will next month implement a prototype resulting from a software research project that has improved detection of online investment scams nearly ten-fold.
ASIC announced in February $1 million in funding for Scamseek, a joint project between the Capital Markets Cooperative Research Centre, the University of Sydney and Macquarie University, and industry partner SMARTS (Security Markets Automated Research Training and Surveillance), to develop a linguistic-based Internet Document Classification System.
According to Scamseek project director Professor Jon Patrick, from the University of Sydney's School of Information Technologies, ASIC's current Web search tool retrieves on average 80 possible scams, of which just one proves illegal after analysis.
Although only three months old, Patrick said Scamseek's 10 staff (nine full-time) have improved this ratio to better than one scam in 10 suspect documents.
"Our objective is to get to better than one scam in two documents, and we're very optimistic we'll get there," he said.
The director of ASIC's electronic enforcement unit, Keith Inman, said ASIC would implement the prototype within a fortnight.
"We had built a concept system some time ago which was a slight improvement on our system. But this Scamseek prototype is a significant improvement on our current methods [of online scam detection]," he said.
The project is currently limited to searching HTML Web sites for unlicensed investment advice and unlawful fundraising.
Patrick said the project's current phase is planned to finish on September 30. However, Inman said if the good results continued, the project would progress to chat sites.
"Our research indicates that people use a range of channels on the Internet for these scams. They use HTML sites, bulletin boards, and chat sites.
"Chat sites will be a different vocabulary, a live conversation, but Jon's [Patrick] methodology uses machine learning to profile new trends and improve the hit rate on topics," Inman said.
Inman said the success of Scamseek had reduced the need for an ASIC 'surf day' this year. On these days, 20 ASIC staff each trawl the Internet for four hours searching for scams. More than 1000 Web pages are viewed, but with only two likely results. This is usually done a few times a year, he said.
Using the Scamseek prototype would increase ASIC's efficiency, Inman said, as the time usually taken to find scams could be spent pursuing Scamseek's results.
Professor Patrick said the way the project worked was that "ASIC supplied us with 7500 documents fitting three classes. There are the scam, [suspect] scam-light and non-scam classes. The target scam class is about 1.8% of the sample."
The project team, which consists of linguists, computational linguists, and software engineers, were not told of the classes or allowed to see the documents before development, Patrick said.
"I can report that the linguists are having a very high success rate in identifying scams," he said.
"In some of the subsections of scams they're identifying 100 per cent of the scams correctly through hand-crafted methods."
The linguists were having "large success" with Nigerian e-mail scams, he said.
The software engineers were finding the challenge harder, however.
Due to the experimental nature of the project, the team chose the Python programming language to help in rewriting large amounts of code.
"Part of the problem is we're only finding 30% of the scams in the sample. There's some we're not seeing using our system.
"The challenge is can you reduce the workload [of analysing suspect documents], but how many scams are being sifted out in the process," he said.
The Scamseek project has involved three systems, he said. A metasearch engine, or 'Web Spider' searches HTML documents for possible scams. These documents are then analysed by the Statistical Information Retrieval System. The classifier used in this system is developed on a lab system before being exported.