Data hoarders pay high price

20.06.2006
Moore's Law states that computing power, as measured by the density of microcomponents per square inch on an integrated circuit, will double every 18 months. Another way of looking at this is that the cost of a given level of compute power will decline by 50 percent in that period.

In storage, we have experienced an equally dramatic transformation with regard to disk capacity. Back in the early '80's, the first hard drive that I owned was a 5MB Apple ProFile that sold for US$2,000. Since then, the unit cost of storage has fallen from several hundreds of dollars per megabyte to around 1/25th of one cent today.

Despite this extraordinary decline in unit cost, demand for storage capacity continues to grow, and, more important, accompanying management and operational costs keep rising. In many environments, this increase is far outpacing the unit cost reduction from Moore's Law. As a result, significant effort is now focused on flattening the trajectory of the cost growth curve.

One major contributing factor to the increase in operational cost is the long-term retention of data. We have become hoarders of data. This is in part a response to legitimate business, regulatory, and litigation concerns, but a big reason is that it's just too difficult to separate the wheat from the chaff. The ability to distinguish data that is currently needed from that which is disposable and to then cleanly extract the latter is simply not doable in many situations.

As a result, an ever-increasing percentage of storage budgets are being spent to service a growing mass of largely inactive retained data. This is essentially, a storage run rate, a base level of cost that can be extrapolated out into the future and upon which additional growth is layered. This run rate includes not only primary storage but backup as well, since this data must also be repeatedly backed up and retained. The result is longer backup windows, slower recovery, and increased numbers of tapes and libraries.

In New York City, there is a billboard that keeps a running tally of a similar run rate -- the national debt. (As I write this, it stands at $8,381,882,122,407.63 and counting.) The storage run rate differs from the national debt in that managing and reducing the storage run rate is doable, but it requires something that is not found in many places -- an effective set of archiving, end-of-life and data destruction polices.