Search engines can't keep up with Web

A new report confirms what surfers already know: search engines simply can't keep up with the Web's growth.

In an article to be published in the July 8 issue of Nature, NEC Institute research scientists Steve Lawrence and C. Les Giles present the findings of their latest study, which shows search engines are providing inadequate, out-of-date and biased coverage of the ever-expanding Web.

From December 1997 to February 1999, the Web more than doubled in size, from 320 million pages to 800 million. Over the same period, the top-ranking search engine's coverage of those pages dropped from approximately 34 to 16 per cent.

"Though the [information-access] situation now is better than it was before the Web and search engines, it is limited," says Lawrence. "It's not as good as it could be."

This year's study compared the ability of 11 major search engines to produce results for 1,050 queries. Northern Light ( was the top-scoring searcher, covering an estimated 16 per cent of Web pages. AltaVista ( and Snap ( tied for a close second place with 15.5 per cent. After scoring the top spot in last year's study, HotBot ( slipped to fourth place this year with 11.3 per cent.

EuroSeek ( landed the bottom berth on the list, finding only 2.2 per cent of Web pages. Marquis portals Lycos, Excite and Yahoo ranked just better with 2.5 per cent, 5.6 per cent and 7.4 per cent coverage, respectively.

Combined, the 11 search engines found only 335 million pages, or 42 per cent, of the total Web. This means that users of metasearch engines, such as MetaCrawler ( and Ask Jeeves (, have a substantially better chance of finding results on a specific topic compared to those who use a single search engine, according to the researchers.

"Basically, it appears that there are limits to how much [the search engines] can index," says Lawrence, who says that search engines face diminishing returns in their efforts to do a better job of indexing the Web.

Rather than purchase additional computational resources to index more pages, the search engines may use funds to offer new services, such as calendaring or chat. Lawrence says funding extra applications "may be better in terms of maximising ad revenue, and therefore, stock prices."

Add to this the fact that most people make relatively simple queries that require only a small database of pages. Not to mention, the time it would take to churn through 800 million pages might be longer than most surfers would be willing to wait. As a result, search engines may have little incentive to improve their coverage.

The scientists found that it can take months for search engines to index new pages. One analysis showed that a search engine took an average of 186 days before it included a new page in its results for a certain query.

Most disconcerting is the fact that search engines appear to be biased in terms of their indexing.

The study found that search engines were more likely to index the most-trafficked pages or those that had many links directing traffic to them. Also, commercial (.com) sites were more likely to be indexed than educational (.edu) sites. And except on AltaVista, pages from US sites were more likely to be indexed than non-US sites.

The danger, according to Lawrence, is that surfers will be able to find only the most popular places to get certain types of information -- which could result in biased research.

It might not matter where users find commonly available information, such as the latest stock quotes, but for other kinds of decisions, wide-ranging searches are necessary. For example, medical research efforts can be wasted if scientists aren't aware that a similar study already exists. And finding background on all candidates in a local election can be critical to casting an informed vote.

The search engines' gaps in coverage leave an opportunity for publishers to create thorough directories on niche topics. Search engines could then direct traffic to these experts.

The good news? Lawrence estimates that the exponential growth in the size of the Web will slow -- eventually. "If we wait a few years, the rate of increase in computational resources is faster than the generation of original text or content by humans."

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.
Show Comments

Cool Tech

Toys for Boys

Family Friendly

Stocking Stuffer

SmartLens - Clip on Phone Camera Lens Set of 3

Learn more >

Christmas Gift Guide

Click for more ›

Brand Post

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Michael Hargreaves

Microsoft Office 365/Dell XPS 15 2-in-1

I’d recommend a Dell XPS 15 2-in-1 and the new Windows 10 to anyone who needs to get serious work done (before you kick back on your couch with your favourite Netflix show.)

Maryellen Rose George

Brother PT-P750W

It’s useful for office tasks as well as pragmatic labelling of equipment and storage – just don’t get too excited and label everything in sight!

Cathy Giles

Brother MFC-L8900CDW

The Brother MFC-L8900CDW is an absolute stand out. I struggle to fault it.

Luke Hill


I need power and lots of it. As a Front End Web developer anything less just won’t cut it which is why the MSI GT75 is an outstanding laptop for me. It’s a sleek and futuristic looking, high quality, beast that has a touch of sci-fi flare about it.

Emily Tyson

MSI GE63 Raider

If you’re looking to invest in your next work horse laptop for work or home use, you can’t go wrong with the MSI GE63.

Laura Johnston

MSI GS65 Stealth Thin

If you can afford the price tag, it is well worth the money. It out performs any other laptop I have tried for gaming, and the transportable design and incredible display also make it ideal for work.

Featured Content

Product Launch Showcase

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?