Quickstudy: Deep Web

19.12.2005

How deep? How big?

According to a 2001 BrightPlanet study, the deep Web is very big indeed: The company found that the 60 largest deep Web sources contained 84 billion pages of content with about 750TB of information. These 60 sources constituted a resource 40 times larger than the surface Web. Today, BrightPlanet reckons the deep Web totals 7500TB, with more than 250,000 sites and 500 billion individual documents. And that's just for Web sites in English or European character sets. (For comparison, remember that Google, the largest crawler-based search engine, now indexes some 8 billion pages.) Bergman's company, a vendor of deep Web harvesting software that works mainly with the intelligence community, accesses sites in over 140 languages, many based on non-Latin characters. BrightPlanet routinely ships its products with links to over 70,000 deep Web sources, all translated into English. Bergman says that his customers are probably accessing two to three times that many sources.

The deep Web is getting deeper and bigger all the time. Two factors seem to account for this. First, newer data sources (especially those not in English) tend to be of the dynamic-query/searchable type, which are generally more useful than static pages. Second, governments at all levels around the world have made commitments to making their official documents and records available on the Web. Bergman says he's aware of at least 10 U.S. states that maintain single-access portals to all state documents and public records.

Interestingly, deep Web sites appear to receive 50% more monthly traffic than surface sites do, and they have more sites linked to them, even though they are not really known to the public. They are typically narrower in scope but likely to have deeper, more detailed content. According to Bergman, only about 5% of the deep Web requires fees or subscriptions.

Kay is a Computerworld contributing writer in Worcester, Massachusetts. You can contact him at russkay@charter.net.