Petascale storage may trickle down to you

06.11.2006

Storage systems have the unfortunate quality of not scaling well. Here are some of the problems that PDSI researchers will try to solve:

-- Disk access times have not kept pace with disk capacity. In 1990, a computer could read an entire hard drive in under a minute. Now it takes three hours or so to read the largest disks. 'It's only going to get worse, and it will take longer and longer to recover from a disk failure,' Miller says.

-- As the number of disks in a system increases, so does the probability that one will fail in any period of time. Right now, big systems at the national laboratories fail once or twice a day, but with multi'petabyte systems, that rate could increase to a failure every few minutes.

-- When a disk does fail, the ones that must restore the affected data to another disk have to work even harder, increasing the chances that one of them will fail too.

Applications at the national labs -- for example, simulations of the aging of nuclear weapons -- can run for months. They generate huge amounts of data, in part because they periodically copy the contents of memory to disk as 'checkpoints' in case a disk or processor fails. Researchers will look for faster checkpoint/restarting methods, better fault-tolerance technologies and more efficient file systems.