Search engines can't keep up with Web

A new report confirms what surfers already know: search engines simply can't keep up with the Web's growth.

In an article to be published in the July 8 issue of Nature, NEC Institute research scientists Steve Lawrence and C. Les Giles present the findings of their latest study, which shows search engines are providing inadequate, out-of-date and biased coverage of the ever-expanding Web.

From December 1997 to February 1999, the Web more than doubled in size, from 320 million pages to 800 million. Over the same period, the top-ranking search engine's coverage of those pages dropped from approximately 34 to 16 per cent.

"Though the [information-access] situation now is better than it was before the Web and search engines, it is limited," says Lawrence. "It's not as good as it could be."

This year's study compared the ability of 11 major search engines to produce results for 1,050 queries. Northern Light (http://www.northernlight.com) was the top-scoring searcher, covering an estimated 16 per cent of Web pages. AltaVista (http://www.altavista.com) and Snap (http://snap.com) tied for a close second place with 15.5 per cent. After scoring the top spot in last year's study, HotBot (http://www.hotbot.com) slipped to fourth place this year with 11.3 per cent.

EuroSeek (http://www.euroseek.net) landed the bottom berth on the list, finding only 2.2 per cent of Web pages. Marquis portals Lycos, Excite and Yahoo ranked just better with 2.5 per cent, 5.6 per cent and 7.4 per cent coverage, respectively.

Combined, the 11 search engines found only 335 million pages, or 42 per cent, of the total Web. This means that users of metasearch engines, such as MetaCrawler (http://www.metacrawler.com) and Ask Jeeves (http://www.aj.com), have a substantially better chance of finding results on a specific topic compared to those who use a single search engine, according to the researchers.

"Basically, it appears that there are limits to how much [the search engines] can index," says Lawrence, who says that search engines face diminishing returns in their efforts to do a better job of indexing the Web.

Rather than purchase additional computational resources to index more pages, the search engines may use funds to offer new services, such as calendaring or chat. Lawrence says funding extra applications "may be better in terms of maximising ad revenue, and therefore, stock prices."

Add to this the fact that most people make relatively simple queries that require only a small database of pages. Not to mention, the time it would take to churn through 800 million pages might be longer than most surfers would be willing to wait. As a result, search engines may have little incentive to improve their coverage.

The scientists found that it can take months for search engines to index new pages. One analysis showed that a search engine took an average of 186 days before it included a new page in its results for a certain query.

Most disconcerting is the fact that search engines appear to be biased in terms of their indexing.

The study found that search engines were more likely to index the most-trafficked pages or those that had many links directing traffic to them. Also, commercial (.com) sites were more likely to be indexed than educational (.edu) sites. And except on AltaVista, pages from US sites were more likely to be indexed than non-US sites.

The danger, according to Lawrence, is that surfers will be able to find only the most popular places to get certain types of information -- which could result in biased research.

It might not matter where users find commonly available information, such as the latest stock quotes, but for other kinds of decisions, wide-ranging searches are necessary. For example, medical research efforts can be wasted if scientists aren't aware that a similar study already exists. And finding background on all candidates in a local election can be critical to casting an informed vote.

The search engines' gaps in coverage leave an opportunity for publishers to create thorough directories on niche topics. Search engines could then direct traffic to these experts.

The good news? Lawrence estimates that the exponential growth in the size of the Web will slow -- eventually. "If we wait a few years, the rate of increase in computational resources is faster than the generation of original text or content by humans."

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Comments

Comments are now closed.

Most Popular Reviews

Follow Us

Best Deals on GoodGearGuide

Shopping.com

Latest News Articles

Resources

GGG Evaluation Team

Kathy Cassidy

STYLISTIC Q702

First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni

STYLISTIC Q572

For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell

LIFEBOOK UH574

The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi

STYLISTIC Q702

The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott

STYLISTIC Q702

My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?