US orders massive supercomputer to manage nuclear stockpile

03.02.2009
The U.S. government has commissioned IBM to build a massive supercomputer that will have 1.6 million processor cores and be 15 times faster than today's most powerful machine, IBM announced Tuesday.

The "Sequoia" supercomputer is scheduled for operation in 2012 and will be able to perform at 20 petaflops, or 20,000 trillion [T] floating point operations per second, IBM said. The , IBM's Roadrunner at the Los Alamos National Laboratory, can manage 1.1 petaflops.

Sequoia will be based on IBM's Blue Gene/Q supercomputer, which is still under development. Ordered by the U.S. Department of Energy, it will be located at the Lawrence Livermore National Laboratory in California and used primarily to manage the U.S.'s aging stockpile of nuclear weapons.

Those weapons contain highly corrosive and radioactive materials and Sequoia will allow scientists to perform simulations to help determine whether the weapons are stable and safe, and if they will work properly if the government should decide to use them.

"The problem we have with the nuclear stockpile is similar to one you might have at home with a car you've kept in the garage for 20 to 30 years," said Mark Seager, assistant department head for advanced technology at Lawrence Livermore. "How do you carefully maintain the car as it ages so that when you go to start the car, you can be very confident it will start? That the probability that it won't start is less than 1 in a million? That's a pretty high level of certitude."

The scientists have been working on the problem for several years with IBM's supercomputer, but they need a more powerful system to explore areas of physics they have not yet tackled and calculate the margin of error for results, Seager said.

Sequoia will occupy 96 server racks over an area a bit larger than a tennis court. IBM won't discuss the machine in detail because it is still being developed, but Dave Turek, vice president of IBM's Deep Computing initiative, said it will be similar in design to its predecessor, Blue Gene/P, but on a much larger scale. The system will run a version of the Linux OS, use IBM's embedded Power processors and have 1.6 petabytes of main memory.

Because a computer this size has never been built, scaling the processor count, memory DIMMs and management subsystems comes with a level of uncertainty, Turek acknowleged. "This is not an exercise for the faint of heart," he said "When you push the limits of scalability you start to observe problems that were simply unanticipated."

Among IBM's challenges will be how to scale the management subsystems to automate as many tasks as possible, and to allow administrators to keep track of workloads and make the right choices during operation.

Lawrence Livermore will have to write applications that can take advantage of such a massively parallel system. It chose IBM's embedded processors because they are "easier to deal with on our complicated weapons code" than the Cell processors used for Roadrunner, Seager said.

Sequoia will be far more energy efficient than a Blue Gene/P system, according to Turek, but because of its size Lawrence Livermore will still have to double the power supply to its computing center. Sequoia will require 6 megawatts of power, compared to 1.8 megawatts for ASC Purple, Seager said.

IBM was picked from five bidders because its costs were lower and it provided a better "risk reduction plan" -- essentially a backup plan if something goes wrong, Seager said. He declined to name the losing bidders but said it was a close contest.

The price tag for the system won't be disclosed until a later date, but such computers can easily run into hundreds of millions of dollars.

Besides managing nuclear weapons, Sequoia will be used for research into astronomy, energy, the human genome and climate change, IBM said. The system will allow forecasters to predict local weather events that are less than 1 kilometer across, it said, compared to 10 kilometers today.

While it is being built, Lawrence Livermore will use a smaller IBM supercomputer called Dawn to develop the weapons applications that will run on Sequoia. Dawn will be operational in the coming months and perform at 500 teraflops.

It's not certain that Sequoia will be the most powerful supercomputer in the world by the time it goes into operation, but Turek sounded confident that it will be.

"We expect it to be, Livermore expects it to be," he said. "At this rarefied level of computing there are few clients around the world looking to make the investment on this scale."