Google talks up smart software for reliability

16.08.2006

"You get reliability by using replication and redundancy [so] for every machine you have multiple cross connections. The idea is if you have multiple failures you have enough replication."

Google's network has multiple levels of redundancy, including between servers, racks, and whole data centers. This redundancy is not random as Google's software knows about infrastructure. The company's PageRank algorithm determines which systems have the most relevance. The PageRank is then sliced up into a bunch of pieces, dubbed Shards, which can be copied to a range of different machines.

The Web search index and Web documents have the same structure and are stitched together with replication. Miscellaneous services, like spell checking and ad serving, run in parallel.

"More than 1000 machines will process some part of [a person's] query and a lot are just there for redundancy," Pike said, adding the Google File System (GFS) is another service.

"We want to scale as we have to serve millions of users [and] replication is how you scale," he said.