Capacity in the Cloud

13.07.2009
Jeff Kubacki, CIO at Kroll Inc., set a goal for the risk management consulting firm Kroll Inc. to reduce its storage costs by 25% over the next three years. With some 13 petabytes of stored data to date, Kubacki plans to attack the problem with a mix of tiered storage, business process changes and newer options -- including cloud storage.

Though in its infancy, cloud storage seems like an attractive option, with its elasticity, utility-like billing, multiple storage locations and ability to pull data directly from the storage device. But the cloud is still uncharted territory when it comes to sending large chunks of data through the ether.

"Cloud is one of those things that we've been talking to our vendors about to see when it might make sense for us to put our toe in the water," Kubacki says. "We're still just figuring out if it's going to be right for us."

Kroll's IT architects will be investigating ways to migrate about 25% of the risk assessment firm's eligible data through its Internet "pipes" and into the cloud. (The majority of data, mostly legal discovery documents, is considered too sensitive to store in the cloud, Kubacki says.) While storage capacity in the cloud is expandable, limits in the capacity of network connections to the cloud can create challenges for enterprises with multiple petabytes of data to move back and forth.

Enterprises are asking whether their pipes are big enough to transfer their stored data to the cloud, and often, the answer is no. "The latency is the big inhibitor for what you can use [cloud] storage for," says Adam Couture, an analyst at Gartner Inc. "Right now, for enterprises, we see the [use restricted to] archiving, backup, maybe some collaboration."

But most cloud providers say there are easy ways around capacity issues when migrating data to the cloud -- starting with the physical migration of the initial data to the data center location.

It's relatively easy to host and transfer large amounts of data from a day-to-day, user-level perspective, says Rob Walters, general manager of the Dallas office of cloud hosting company The Planet. But moving 20TB to 25TB of data in a chunk continues to daunt current systems. "The networks that we have [today] just aren't good at it. It's just a weak point right now, and everybody is looking at dealing with that," Walters says.

For enterprises, the "initial ingestion" of backup data to the cloud can be done by copying data to the cloud over a WAN or LAN link, but "that initial backup, depending on how much data you have on your server, could take weeks," Couture cautions.

Doctors' offices that hire Arvada, Colo.-based Nuvolus to create private cloud storage for their sensitive patient data don't like data to be copied and physically taken out of their offices, says Nuvolus CEO Kevin Ellis. So the company requires its health care industry clients to have "a decent Internet connection" -- typically 500Gbit/sec. -- to transfer the backup data over the pipes, says Ellis.

"Depending on the office, we could be looking at pretty long upload times," he says. "You're uploading overnight. We're trying to make sure we're not impacting the doctor's office during the day as well."

Some vendors also offer private connections established from the enterprise to one of the provider's storage nodes. This is well suited for companies with initial data sets between 2TB and 75TB, or fewer than 750 million files, and where data transfer is time-sensitive, according to Nirvanix Inc., a San Diego-based cloud storage provider. It also works well for one-time and ongoing data migration that requires high throughput and moderate latency.

The other option -- most often used by enterprises -- is the "sneakernet" approach, where data is physically picked up from the customer on a disk, tape or appliance provided by the cloud storage provider, and taken to the data center for initial backup.

"We've had customers that have shipped storage arrays," says Jon Greaves, chief technology officer at private cloud host Carpathia Hosting Inc. in Ashburn, Va. "In some cases, customers have physically removed disks from the chassis after they have been mirrored, and delivered those."

Nirvanix, for instance, will send its customers storage servers with dual Gigabit Ethernet ports to transfer data within their own facilities. Once the data is transferred, the servers are sent back to Nirvanix and the data is migrated to the cloud.

Amazon Web Services LLC supports moving large amounts of data into and out of its cloud using portable storage devices. It uses a high-speed internal network to transfer customer data directly onto and off of storage devices, bypassing the Internet.

Greaves has seen large companies use both the Internet and sneakernet methods for data transfer.

Carpathia builds private clouds for its enterprise customers based on technology from ParaScale Inc. "It depends on how quickly they need to see data up and running, and the use of the data. If it's long-term archiving, it's typically a more gradual migration of data," he explains. "If they need video files for immediate use, and it's tens to hundreds of terabytes, that's the time we start looking at alternative methods."

After that initial transfer, Internet bandwidth will rebound because only blocks of data that have been changed are added to the backup.

There is no such thing as ultimate scalability or infinite capacity in the cloud, Walters says. It's the provider's responsibility to plan capacity, manage delivery of future storage and stay ahead of demand. "If someone is going to upload 10-plus terabytes [of data], you know about that in advance, and it's a carefully orchestrated exercise," he says.

Storage providers use sophisticated methods for capacity planning. Carpathia, for instance, constantly pushes traffic across its network at about 450Gbit/sec. to 500Gbit/sec. and plans for capacity changes using algorithms borrowed from the telecommunications industry.

"You have a T1 line and have to figure out how many core minutes you can squeeze through that T1 line, which is really an overprovisioning problem," Greaves explains.

Telecom companies also use a unit of measure called an erlang, which describes total traffic volume in one hour, to help determine where they are in the provisioning cycle. "We use exactly the same approach on our cloud," Greaves says. "We can figure out that we're at 1.2, and at 2 we're going to have capacity challenges. So when we hit that 1.2 threshold, that's when we order more hardware."

For Kroll, a cloud storage decision will wait until 2010. "I never like to be on the bleeding edge. [But] I don't mind the leading edge," Kubacki says.

But he adds that cloud storage will still be an attractive option next year. "I think one benefit of moving to the cloud would be the whole concept of it being more of an expense transaction versus a capital transaction," Kubacki. "Today I have a large capital budget; I'm buying my disk and depreciating it over a number of years. So I'm kind of shifting what my P&L looks like by having some of that data in the cloud. I'm not actually buying storage; I'm almost renting it."

Collett is a Computerworld contributing writer. Contact her at stcollett@aol.com.