How to Implement Next-Generation Storage Infrastructure for Big Data

16.04.2012

When Day joined Shutterfly in 2009, storage had already become one of the company's biggest buckets of expense, and it was growing at a rapid clip--not just in terms of raw capacity, but in terms of staffing.

"Every n petabytes of additional storage meant we needed another storage administrator to support that physical and logical infrastructure," Day says. With such massive data stores, he says, "things break much more frequently. Anyone who's managing a really large archive is dealing with hardware failures on an ongoing basis. The fundamental problem that everyone is trying to solve is, knowing that a fraction of your drives are going to fail in any given interval, how do you make sure your data remains available and the performance doesn't degrade?"

The standard answer to failover is replication, usually in the form of RAID arrays. But at massive scales, RAID can create more problems than it solves, Day says. In a traditional RAID data storage scheme, copies of each piece of data are mirrored and stored on the various disks of the array, ensuring integrity and availability. But that means a single piece of data stored and mirrored can inflate to require more than five times its size in storage. As the drives used in RAID arrays get larger--3 terabyte drives are very attractive from a density and power consumption perspective--the time it takes to get a replacement for a failed drive back to full parity becomes longer and longer.

"We didn't actually have operational issues with RAID," Day says. "What we were seeing was that as drive sizes became larger and larger, the time to get back to a fully redundant system when we had any component failure was going up. Generating parity is proportional to the size of the data set that you're generating it for. What we were seeing as we started using 1-terabyte and 2-terabyte drives in our infrastructure was that the time to get back to full redundancy was getting quite long. The trend wasn't heading in the right direction."