Content addressed storage systems may be at risk

28.01.2005
Von 
Lucas Mearian ist Senior Reporter bei der Schwesterpublikation Computerworld  und schreibt unter anderem über Themen rund um  Windows, Future of Work, Apple und Gesundheits-IT.

Security experts are warning about a flawed hashing algorithm, MD5, used by some vendors for digital signatures to store data securely on increasingly popular content addressed storage systems. The warnings come as more companies unveil CAS systems to meet the need for disk-based backup of fixed data such as e-mails and medical images.

"It really is time for (the industry) to stop using MD5," said Dan Kaminsky, a security consultant at Avaya Inc. in Basking Ridge, N.J.

"MD5 has been a deprecated hashing algorithm for almost a decade. The U.S. government agreed."

According to Kaminsky, MD5 has been decertified for secure operations by the National Institute of Standards and Technology since at least 1998. "The industry has clung to the algorithm, partially out of inertia, partially out of scarcity of computer power," he said.

There are currently three major vendors of CAS storage: EMC Corp., Permabit Inc. in Cambridge, Mass., and Archivas Inc. in Waltham, Mass. Both EMC and Archivas use the MD5 hashing algorithm; Permabit does not.

Just this week, Storage Technology Corp. announced that it would use OEM Permabit"s technology for e-mail archival. And Sun Microsystems Inc. is currently developing its own CAS, called Honeycomb, with several beta testers and plans to release it toward the end of the year.

Sun wouldn"t say which algorithm it will use to store data.

Kaminsky published a report last month on the MD5 algorithm pointing out that an attack could be used to create two files with the same MD5 hash, one with "safe" data and one with "malicious" data. When both of those files are saved to the same system, a so-called collision can result, leading to data loss or dissemination of bad data, Kaminsky said.

CAS systems store metadata and data along with management policies to create an object that is quickly retrievable, no matter where its stored on a disk subsystem. CAS also uses write once, read many (WORM) capability to ensure that once data is stored it cannot be overwritten, which satisfies several regulatory requirements. Hashing is a way to create a shorter fixed-length key or index that represents the original data stored in a device. A multidigit number, for example, could be a hash representing a person"s longer name or a specific document.

The vast majority of CAS is being purchased by the financial services and medical industries to store data regulated by the U.S. Securities and Exchange Commission under Rule 17a-4 and Health Insurance Portability and Accountability Act regulations. EMC"s US$200,000 Centera CAS array uses the MD5 hashing algorithm to create a 27-character digital signature for objects its stores.

But CAS users disagree about the dangers of using the MD5 hash.

"I believe that the possibility of a collision is so unlikely that it does not bother me. Thus far, we"ve been working with Centera for more than a year without a single issue," said John Halamka, chief information officer at Boston-based CareGroup Inc., a hospital management company.

Mike Kilian, chief technology officer of EMC"s Centera division, said MD5 flaws don"t apply to Centera because once a piece of content is stored, a company isn"t allowed to change it.

"Centera from almost Day 1 has had multiple addressing schemes available to applications," Kilian said. "One of those addressing schemes added additional information to the address that guaranteed it to be differentiated. So not only did you get protection against malicious attempts to create conflicting MD5s, you get protection even against accidental or very improbable MD5 collisions."

Kaminsky disagreed, saying "cryptography tends to be a ... garbage algorithm in, garbage security out" discipline. "Let"s say they were appending custom metadata to the end of their files. Conceivably, the attack would not care, as once two files have the same hash, you can append the same (identical) metadata to both of them and they"ll still possess the same hash."

Dave Wagner, an assistant professor of computer science at the University of California, Berkeley, specializing in cryptography and security, said any collision of data through the MD5 algorithm "won"t happen by accident." But he also emphasized that just because a system uses MD5 doesn"t mean it"s unreliable.

"The kind of questions I"d ask is, Has this system undergone any independent security review? Because the exact level of risk depends on so much. But if you"re using a flawed algorithm I"d want more scrutiny," Wagner said.

With an eye on the MD5 issue, StorageTek has announced that it is reselling the Permeon Compliance Vault from start-up Permabit for e-mail archiving -- and chose Permabit"s technology in part because it didn"t use the MD5 algorithm, according to Harvey Andruss, product marketing manager at StorageTek.

Permabit"s product uses the Secure Hash Algorithm. Unlike Centera, Permabit uses an appliance to create the objects and can then store those objects on any number of hardware platforms. It also uses the Network File System interface for Unix and the Common Internet File System interface for Windows servers rather than a proprietary application programming interface.

The StorageTek version of Permabit"s technology is called the Lifecycle Fixed Content Manager 100 and starts at $75,000 retail.

Other CAS users, like Halamka, said they are less concerned about the MD5 hash security issue.

Curt Tilmes, system engineer at the NASA Goddard Space Science Center in Greenbelt, Md., has been beta-testing an Archivas Cluster (ArC) CAS system for archiving satellite data about the earth"s atmosphere for more than a year. Tilmes said he feels it"s secure because its on a private network with firewalls.

"I suppose it wouldn"t hurt (to use a more secure algorithm), but for my application, it wouldn"t have an effect," he said.

Archivas introduced its ArC last year. ArC, like Centera, uses the MD5 hashing algorithm. The difference between Centera and Archivas" ArC technology is that ArC is software-based and can be installed on a variety of hardware platforms, where Centera comes on a proprietary hardware platform from EMC.

Mike Luter, CTO at the Cancer Therapy & Research Center in San Antonio, purchased an ArC storage system last year from Archivas after determining that EMC"s $200,000-plus array was too costly and less flexible. Luter said he would like to see Archivas move to a more modern hashing method, such as Secure Hash Algorithm (SHA-1).

But he also feels that the medical images he stores on his 2TB box are secure enough.

"The only way you can be totally secure is to unplug yourself and turn off your computer. If they"re going to get in, they"re going to get in. I look at security as being one of the main pieces you have to look at, but you have to look at redundancy and productivity. You can be really secure and hurt yourself production-wise," he said.