Army makes major Linux HPC cluster move

13.02.2006

A U.S. Army supercomputing center with a legacy that dates to the first large computer, the Electronic Numerical Integrator and Computer (ENIAC) launched in 1946, is moving to Linux-based clusters in a major hardware purchase that will more than double its computing capability.

The Army Research Laboratory Major Shared Resource Center (MSRC)is buying four Linux Networx Inc. Advanced Technology Clusters, including a system with 4,488 processing cores, or 1,122 nodes, with each node made up of two dual-core Intel Xeon chips. A second system has 842 nodes.

In total, this purchase will increase its computing capability from 36 trillion floating-point operations per second (TFLOPS) to more than 80 TFLOPS, Army officials said.

The MSRC, which is based at the Aberdeen Proving Ground in Harford County, Md., has been involved in every aspect of computing technology since its beginning, and this decision to move into commodity clusters was not made quickly, said Charles J. Nietubicz, director of the MSRC.

The lab held a symposium in 2003 to explore the issue and began running a small, 256-processor cluster system. "We saw that cluster computing was this new kid on the block and was interesting," said Nietubicz, but the center wasn't about to start scrapping its other systems made by Silicon Graphics Inc., Sun Microsystems Inc. and IBM, he said.

The MSRC isn't disclosing the purchase price, but Earl Joseph, an analyst at IDC in Framingham, Mass., said the average cost for a cluster works out to about US$2,000 per processor compared with $12,000 per processor for a RISC-based system.

Nietubicz said other vendors "are going to have to begin to recognize that either they provide some other kind of performance to try to gain the increased price, or they are going to have to reduce the price to provide equivalent performance."

Bluffdale, Utah-based Linux Networx builds systems using Advanced Micro Devices Inc. and Intel Corp. chips. In addition to the four systems sold to the MSRC, it also sold one to the Dugway Proving Ground. In total, the sale of the five systems is the company's largest supercomputing order ever. The sale was announced Monday.

Nietubicz said he was convinced that clusters can work based on the MSRC's ability to get certain computational codes used in fluid dynamics, structural mechanics and other processes to scale to multiple processors mostly by using Message Passing Interface (MPI) protocol-based code. MPI is used to create parallel applications.

The major competitor to supercomputing clusters and their distributed memory systems is symmetric multiprocessing, or SMP, a shared-memory system primarily used in RISC-based systems. Of the total $9.1 billion high-performance computing market last year, clusters accounted for about half of the sales, according to IDC.

A major limitation for moving to clusters is whether the high-performance software can scale to multiple processors. Systems that have been written in MPI can do so, but Joseph said it's difficult to accomplish for companies since many off-the-shelf software packages don't use MPI. Government labs and universities, which own their own code, can usually invest the time to convert their codes into MPI, he said.

Nietubicz doesn't see any major limitations to clusters, and while not all codes can scale on clusters, he said the same problems occurred as the center moved from vector to shared memory. "In each major transition, there were always people saying, 'I can't use that, I need my old stuff.'"