Search engines can't keep up with Web

A new report confirms what surfers already know: search engines simply can't keep up with the Web's growth.

In an article to be published in the July 8 issue of Nature, NEC Institute research scientists Steve Lawrence and C. Les Giles present the findings of their latest study, which shows search engines are providing inadequate, out-of-date and biased coverage of the ever-expanding Web.

From December 1997 to February 1999, the Web more than doubled in size, from 320 million pages to 800 million. Over the same period, the top-ranking search engine's coverage of those pages dropped from approximately 34 to 16 per cent.

"Though the [information-access] situation now is better than it was before the Web and search engines, it is limited," says Lawrence. "It's not as good as it could be."

This year's study compared the ability of 11 major search engines to produce results for 1,050 queries. Northern Light (http://www.northernlight.com) was the top-scoring searcher, covering an estimated 16 per cent of Web pages. AltaVista (http://www.altavista.com) and Snap (http://snap.com) tied for a close second place with 15.5 per cent. After scoring the top spot in last year's study, HotBot (http://www.hotbot.com) slipped to fourth place this year with 11.3 per cent.

EuroSeek (http://www.euroseek.net) landed the bottom berth on the list, finding only 2.2 per cent of Web pages. Marquis portals Lycos, Excite and Yahoo ranked just better with 2.5 per cent, 5.6 per cent and 7.4 per cent coverage, respectively.

Combined, the 11 search engines found only 335 million pages, or 42 per cent, of the total Web. This means that users of metasearch engines, such as MetaCrawler (http://www.metacrawler.com) and Ask Jeeves (http://www.aj.com), have a substantially better chance of finding results on a specific topic compared to those who use a single search engine, according to the researchers.

"Basically, it appears that there are limits to how much [the search engines] can index," says Lawrence, who says that search engines face diminishing returns in their efforts to do a better job of indexing the Web.

Rather than purchase additional computational resources to index more pages, the search engines may use funds to offer new services, such as calendaring or chat. Lawrence says funding extra applications "may be better in terms of maximising ad revenue, and therefore, stock prices."

Add to this the fact that most people make relatively simple queries that require only a small database of pages. Not to mention, the time it would take to churn through 800 million pages might be longer than most surfers would be willing to wait. As a result, search engines may have little incentive to improve their coverage.

The scientists found that it can take months for search engines to index new pages. One analysis showed that a search engine took an average of 186 days before it included a new page in its results for a certain query.

Most disconcerting is the fact that search engines appear to be biased in terms of their indexing.

The study found that search engines were more likely to index the most-trafficked pages or those that had many links directing traffic to them. Also, commercial (.com) sites were more likely to be indexed than educational (.edu) sites. And except on AltaVista, pages from US sites were more likely to be indexed than non-US sites.

The danger, according to Lawrence, is that surfers will be able to find only the most popular places to get certain types of information -- which could result in biased research.

It might not matter where users find commonly available information, such as the latest stock quotes, but for other kinds of decisions, wide-ranging searches are necessary. For example, medical research efforts can be wasted if scientists aren't aware that a similar study already exists. And finding background on all candidates in a local election can be critical to casting an informed vote.

The search engines' gaps in coverage leave an opportunity for publishers to create thorough directories on niche topics. Search engines could then direct traffic to these experts.

The good news? Lawrence estimates that the exponential growth in the size of the Web will slow -- eventually. "If we wait a few years, the rate of increase in computational resources is faster than the generation of original text or content by humans."

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.
Show Comments

Cool Tech

SanDisk MicroSDXC™ for Nintendo® Switch™

Learn more >

Breitling Superocean Heritage Chronographe 44

Learn more >

Toys for Boys

Family Friendly

Panasonic 4K UHD Blu-Ray Player and Full HD Recorder with Netflix - UBT1GL-K

Learn more >

Stocking Stuffer

Razer DeathAdder Expert Ergonomic Gaming Mouse

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?