Different shades of grid

25.02.2005
Von Ian Foster

As with most nascent enterprise technologies, early discussions around grid have focused primarily on how it will affect large system vendors" business models. While it will be interesting to watch IBM Corp."s and Sun Microsystems Inc."s "grid as a commodity" strategies progress, let"s not forget that the utility computing business model is far from the full extent of the grid discussion.

The future is not just about huge brokers outsourcing grid for core IT operations. If we think about a continuum in which companies are either grid brokers or grid consumers, there are -- pardon the pun -- many shades of grid in between.

One area that will be particularly interesting to watch is how enterprise service providers incorporate grid into their production environments, thereby extending the benefits of grid to their customers via their products and services. For this column, I spoke with three providers (each serving a different industry) and learned how they have incorporated grid into their production environments.

Acxiom

Little Rock, Ark.-based Acxiom Corp. provides systems and services to organizations that need to process and analyze massive amounts of data quickly and accurately. In 2000, Acxiom faced a significant scale issue when sales suddenly spiked around its data integration application AbiliTec, which ran on large Unix systems.

"We were having tremendous growth not only in the amount of customers, but in the number of files and records they wanted to run through the AbiliTec application," said Terry Talley, chief architect for Acxiom"s products and infrastructure technology group.

In order to scale their production environment, Acxiom needed very large memory space and a lot of additional processors. "But we realized that financially, it didn"t make sense to scale by adding an arbitrary number of expensive SMPs (symmetric multiprocessors)," said Talley. "The only way to go was commodity boxes."

The company replaced SMPs with Linux nodes with dual CPUs (typically in the 3-GHz range) and 4GB of RAM. The company also realized early on that it would need some way to manage this environment. After surveying available commercial management tools, it decided instead to build its own management console, which it dubbed "Apiary." The results obtained with this homegrown grid have been impressive.

"Historically, when we ran our software on conventional platforms, we"d jump through hoops to get a 5 percent gain in performance on a particular application," said Alex Dietz, CIO at Acxiom. "With grid, we go 10 times faster, and we could go 100 times faster, if we decided to. The incremental scalability of grid blows your mind."

Today, Apiary manages a grid of more than 6,000 nodes, all running Linux. Acxiom calls Apiary"s dynamic provisioning capabilities "hive for hire" -- and the grid production environment now processes more than 50 billion AbiliTec links per month, at a 10x throughput improvement for its large batch jobs. And reportedly, the company"s AbiliTec customers have experienced zero outages in four years (100 percent availability).

"Some people are concerned about the reliability of the grid," said Terry. "But in our case, by nature of the grid, we can afford to have redundant services and redundant capabilities. We"ve had machines fail, but because of our grid"s automatic failover capabilities -- the applications in our grid environment have not gone down once in four years."

That"s a message I"ve heard from other early adopters: Grid not only can increase performance and reduce costs, but also improve reliability.

All of Acxiom"s services are now deployed on the grid, and much of the company"s workflow processes also currently run on the grid. Acxiom is also pushing its data warehousing and data mart solutions onto the grid.

Bowne

As the world"s largest financial printer, New York-based Bowne & Co. Inc. sees enormous spikes in IT resource demand when its clients" SEC deadlines roll around. During these peak annual financial processing timeframes, Bowne receives large numbers of end-customer statements that it must process in a very short time. At the heart of this production schedule is Bowne"s Statements application -- custom-built software running on hundreds of dedicated servers.

In 2003, Bowne launched a proof-of-concept grid pilot, investigating ways to leverage grid principles to address the low utilization rate of the resources that support its Statements application.

"At the onset of the project, we asked ourselves, "How big should the grid be?"" said Ellen Kraus, chief architect at Bowne. "There"s no right or wrong answer to that question. What we decided at Bowne was that we weren"t going to try to boil the ocean and build the biggest production grid ever. We just wanted to borrow some of the grid techniques to break the static link between application and server so that we can increase our processing flexibility."

So Bowne built a pilot production grid, consisting of four servers. One, a Data Synapse Live Cluster server, was designated as the job controller, which controls two grid engines. The fourth server was designated to feed jobs through these grid engines.

With Bowne"s grid, when a job is created, it"s sent to the controller, which determines which grid servers are available. Rather than dedicated jobs on dedicated servers, and serially processing those jobs, the pilot grid enables multithreading and dynamic provisioning capabilities. Kraus estimated that the grid provided 8x possible gains in utilization; she also indicated that the grid could easily scale to accommodate an additional 30 or 40 engines.

While Bowne"s grid pilot was successful, Kraus observed some challenges that grid will face in other enterprise settings. For starters, while Bowne"s Statements application is custom-built, and therefore exempt from licensing issues, she sees licensing as an area that needs to be addressed before grid can tackle some proprietary application environments. She also cited cultural issues with clients -- many of whom, no matter how extensively grid security issues might be addressed in the short term, will insist on having their financial information processed on separate servers.

Kraus also cited cultural issues with IT professionals. "Grid computing is a paradigm shift, and it"s not just a single issue that IT has to deal with," said Kraus. "It"s not just about changing how you provision servers, or how you do development. You need a different skill set, new abilities on monitoring and measuring and a different deployment mechanism to know what type of environment you want to provision. It"s not that tough to understand the business value. But the cultural issues are tougher to reconcile."

GlobeXPlorer

Walnut Creek, Calif.-based GlobeXplorer LLC delivers what it calls the world"s largest online library of aerial/satellite imagery and maps. If you go to their Web site and type in your address, you"ll find an aerial photo of your neighborhood. GlobeXPlorer serves this content and more to partners nationwide, and the demand that both serving and ingesting these large files places on its IT resources is significant. GlobeXPlorer handles more than 500TB of data on a regular basis.

"At the same time we"re serving millions of maps a day to our customers, we"re ingesting new content," said Rob Shanks, president and CEO of GlobeXplorer. "In our business, you have to keep the content fresh, so we"re always ingesting. When you"re providing aerial and satellite imagery of all of North America, and expanding to the entire world, it"s a massive undertaking to deal with the related file formatting, file converting, changing color contrast, stitching to neighboring images, and serving to image servers. We"re ingesting a terabyte a week at a time in our grid."

GlobeXplorer uses grid as the way to manage its CPU cycles, controlling the queuing and prioritization of jobs. If servers are needed for a lot of image serving for a particular area, GlobeXPlorer"s grid will slow down the processing of new data and deal with the paying customers first.

"We really use grid to manage this supercomputer system -- hundreds of CPUs turned into one virtual supercomputer," said Shanks. "We can assign on each CPU what path we want to give priority, and when we want things to happen. Because it"s so dynamic, we can do a lot more with less hardware resources."

Because GlobeXplorer"s grid system is built on open-source standards such as the Globus Toolkit, the company can make modifications with relative ease. GlobeXPlorer"s grid hardware environment consists of a mixture of Sun Sparc boxes and Dell Linux boxes -- all the thin, rack-mounted variety.

Where many enterprises are just beginning to investigate grid and virtualization techniques, GlobeXplorer has been using these principles for years.

"We actually started with this virtual system in the beginning," said Shanks. "We always built our system in a network mode, such that we can have servers in Japan or London or San Francisco, and they all work as one system. Whether you"re inside our cage in our data center or external, this network sees everything together -- the tape, the drive, everything. Everything is database-driven. Our machines are interconnected by a network, always built to run four or less CPU provisioning requirements, so we don"t have to purchase multimillion-dollar, massive servers."

GlobeXplorer recently performed external processing and delivery of up-to-date imagery for the tsunami disaster area, with the help of some of the company"s satellite partners.

Grid pioneer Ian Foster is a board member at the Globus Consortium, a vendor-neutral, nonprofit organization promoting the open-source Globus Toolkit in the enterprise. He can be reached at foster@mcs.anl.gov.