Google talks up smart software for reliability

16.08.2006

Lots of smaller commodity machines together with smart software can boost the reliability of enterprises' business-critical systems, according to search giant Google.

In a rare insight into the inner-workings of the world's most popular search engineer, Google principal engineer Rob Pike said the problem is not so much the implementation of algorithms, rather that "everything breaks", and on the scale of Google "a lot will break every day".

"If you have 1000 computers in your cluster you can expect one to die every day [and] ours are not that good [so] stuff happens all the time," Pike said.

"The Internet is so big you are going to expect to see failure; the software has to be robust so you can afford to by stuff more often. You can buy tons of cheap crap [and] it's crappier than you might think."

Speaking in Sydney at an Australian Service for Knowledge of Open Source Software (ASK-OSS), a Department of Education, Science and Training-funded initiative to advise about open source software, Pike expressed doubts about the reliability of "expensive" hardware and software, saying commodity hardware with open source will yield a more powerful solution.

"Google is not a computer somewhere, it's a lot of computers running proprietary and open source software," he said.

"You get reliability by using replication and redundancy [so] for every machine you have multiple cross connections. The idea is if you have multiple failures you have enough replication."

Google's network has multiple levels of redundancy, including between servers, racks, and whole data centers. This redundancy is not random as Google's software knows about infrastructure. The company's PageRank algorithm determines which systems have the most relevance. The PageRank is then sliced up into a bunch of pieces, dubbed Shards, which can be copied to a range of different machines.

The Web search index and Web documents have the same structure and are stitched together with replication. Miscellaneous services, like spell checking and ad serving, run in parallel.

"More than 1000 machines will process some part of [a person's] query and a lot are just there for redundancy," Pike said, adding the Google File System (GFS) is another service.

"We want to scale as we have to serve millions of users [and] replication is how you scale," he said.

"If we had one data center serving the whole world there wouldn't be enough capacity. This distributed nature works well."

Google is tight-lipped about the exact number of servers it is running, but with the last count some years ago being 10,000, there's speculation now that the number could be hundreds of thousands.

However, Pike did say that nowadays Google's servers are custom-designed and run Red Hat Linux with a modified kernel.

"Failures happen no matter what you do, but if you plan for it you can survive," he said.

Pike reminisced about one such failure when a rack of GFS servers with about 50 machines and 10TB of disk and human error accidentally caused the whole rack to reset, wiping the disks clean.

But with all the reliability features coded into Google's software, the only reason the engineers knew it ever happened was when the cluster got slower and network traffic in the "cell" spiked as the system was balancing itself back to its original state.

Pike said he doesn't believe Google could have achieved this level of reliability with Windows.

"It was a good decision [to choose Linux] because it made it so flexible in the way you can do it," he said.

"We administer a massive cluster with remarkably few people. It would have been too hard to do what we did without open source."

Pike then jokingly said if Microsoft's claims about the number of Windows administrators required per server were true compared to Linux, "Google would be a much bigger company".