Search engines can't keep up with Web

A new report confirms what surfers already know: search engines simply can't keep up with the Web's growth.

In an article to be published in the July 8 issue of Nature, NEC Institute research scientists Steve Lawrence and C. Les Giles present the findings of their latest study, which shows search engines are providing inadequate, out-of-date and biased coverage of the ever-expanding Web.

From December 1997 to February 1999, the Web more than doubled in size, from 320 million pages to 800 million. Over the same period, the top-ranking search engine's coverage of those pages dropped from approximately 34 to 16 per cent.

"Though the [information-access] situation now is better than it was before the Web and search engines, it is limited," says Lawrence. "It's not as good as it could be."

This year's study compared the ability of 11 major search engines to produce results for 1,050 queries. Northern Light (http://www.northernlight.com) was the top-scoring searcher, covering an estimated 16 per cent of Web pages. AltaVista (http://www.altavista.com) and Snap (http://snap.com) tied for a close second place with 15.5 per cent. After scoring the top spot in last year's study, HotBot (http://www.hotbot.com) slipped to fourth place this year with 11.3 per cent.

EuroSeek (http://www.euroseek.net) landed the bottom berth on the list, finding only 2.2 per cent of Web pages. Marquis portals Lycos, Excite and Yahoo ranked just better with 2.5 per cent, 5.6 per cent and 7.4 per cent coverage, respectively.

Combined, the 11 search engines found only 335 million pages, or 42 per cent, of the total Web. This means that users of metasearch engines, such as MetaCrawler (http://www.metacrawler.com) and Ask Jeeves (http://www.aj.com), have a substantially better chance of finding results on a specific topic compared to those who use a single search engine, according to the researchers.

"Basically, it appears that there are limits to how much [the search engines] can index," says Lawrence, who says that search engines face diminishing returns in their efforts to do a better job of indexing the Web.

Rather than purchase additional computational resources to index more pages, the search engines may use funds to offer new services, such as calendaring or chat. Lawrence says funding extra applications "may be better in terms of maximising ad revenue, and therefore, stock prices."

Add to this the fact that most people make relatively simple queries that require only a small database of pages. Not to mention, the time it would take to churn through 800 million pages might be longer than most surfers would be willing to wait. As a result, search engines may have little incentive to improve their coverage.

The scientists found that it can take months for search engines to index new pages. One analysis showed that a search engine took an average of 186 days before it included a new page in its results for a certain query.

Most disconcerting is the fact that search engines appear to be biased in terms of their indexing.

The study found that search engines were more likely to index the most-trafficked pages or those that had many links directing traffic to them. Also, commercial (.com) sites were more likely to be indexed than educational (.edu) sites. And except on AltaVista, pages from US sites were more likely to be indexed than non-US sites.

The danger, according to Lawrence, is that surfers will be able to find only the most popular places to get certain types of information -- which could result in biased research.

It might not matter where users find commonly available information, such as the latest stock quotes, but for other kinds of decisions, wide-ranging searches are necessary. For example, medical research efforts can be wasted if scientists aren't aware that a similar study already exists. And finding background on all candidates in a local election can be critical to casting an informed vote.

The search engines' gaps in coverage leave an opportunity for publishers to create thorough directories on niche topics. Search engines could then direct traffic to these experts.

The good news? Lawrence estimates that the exponential growth in the size of the Web will slow -- eventually. "If we wait a few years, the rate of increase in computational resources is faster than the generation of original text or content by humans."

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Kurt Hegetschweiler

Brother PocketJet PJ-773 A4 Portable Thermal Printer

It’s perfect for mobile workers. Just take it out — it’s small enough to sit anywhere — turn it on, load a sheet of paper, and start printing.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?