Recovery specialists revive dead data

24.07.2006
The urgent need to recover digital data after disasters like hard drive failures and power outages has remained largely unaltered since companies entered the Electronic Age. What has changed is the greater complexity of the devices that fail and their increased media capacity, according to specialists from Ontrack Data Recovery. Computerworld's Robert L. Mitchell spoke with Todd Johnson, vice president of operations, and Mike Burmeister, director of engineering for data recovery, about how data recovery has changed and what users can do to avoid having to call Ontrack in the first place.

How are the technical challenges of recovering data different today?

Burmeister: Ten to 15 years ago, you could solve many drive failures with simple procedures that could be summarized as mechanical in nature. Today, we deal with the same mechanical problems, but they are much more difficult because of ... the smaller drives and increased media capacity. On top of the mechanical problems, today there is an array of electronic failure scenarios that are very complex and require high-level engineering skills to solve.

Johnson: Things like the Exchange databases, the SQL database recoveries, are relatively new, and it's much more challenging from a computer science perspective to solve.

How have your recovery techniques changed?

Johnson: Many techniques are the same, with little changes or adjustments. But there are also many new techniques developed to address the difficulties of electronics failures and various forms of data corruptions that leave drives in a completely dead state.

Are most problems hardware- or software- related?

Burmeister: Both. It can be multiple things going on, where more than one hard drive failed in a RAID system. That is something we see on a regular basis. It can be things like a RAID controller failing. The drives may be working, but there's no intelligence anymore that knows how that bank was put together. The third thing is what we refer to as human-error problems. When there are drive failures ... the process is to swap out the bad drive. People pull out the wrong drive by accident and get the drives in the wrong order, which results in a scrambling of data.

If my server hard disk crashes, wouldn't I be better off recovering from tape than paying a premium to recover data from the failed disks?

Burmeister: We've done a lot of work on server recoveries where we can rebuild a system faster than [IT] can restore a tape. A day of being behind with data [can be] worth millions of dollars.

What is it going to cost me to get that data back?

Johnson: Our average recovery is around US$1,000. We do it in a two-step process. When we receive a drive or server, we do a first-step evaluation, which is typically around $100. Within 24 to 48 hours, we are able to provide a complete report on their situation and let them know what's recoverable and what's not. The entire turnaround time is three to five days.

What is your success rate?

Johnson: We can recover data from almost every device we get in. We could recover 99 percent of the data, and if we're not able to recover that one file, it's unsuccessful.

Burmeister: Take aside drives that are physically damaged, and we're close to 100 percent able to recover data. After Hurricane Katrina, we received hundreds of what we call flood jobs. Even there, we are close to 70 percent to 75 percent successful. Drives that have been in fires, drives that were intentionally thrown around by customers ... pretty severe things can happen, and we can still get at it if the media is more or less intact.

What are the most common failures you see?

Burmeister: In general, moving parts and moving laptops don't go together well. If you're using a laptop, try not to move it around while it's actually running.

Johnson: Between 60 percent and 70 percent of the drives we receive have to go into the clean room, which means they are having physical problems. To avoid that, we talk about backing up, but one of the things that's most commonly missed is verifying that backups work. We get calls all the time from customers who say, "When I went to my backup, it didn't back up the right volume," or "The tape drive wasn't working."

How successful are you at recovering from tape?

Burmeister: A lot of those are one- or two-read errors, and we can usually recover that without much difficulty. The others are human errors, when people accidentally erase the tape. These are more complex and difficult. Formatting the wrong tapes is the most common [problem] by far. The other one is they back up over a tape that has information on it. The general thing is, they back up the wrong data.

When is a problem beyond the scope of the self-service tools, and how do users know when to use them and when not to?

Burmeister: With the EasyRecovery software that we sell, we go to great lengths to make sure we don't provide something that will potentially make matters worse. We see problems coming into the labs every day where if the customer hadn't taken that last step, we could have gotten their data back. We don't change what's on the media. We try to extract the data onto new media.

Johnson: If the data is really critical and if the user isn't technically savvy enough to use the software, they should contact a recovery service. If a drive is physically failing and you run a lot of diagnostic utilities, you can do damage.

What is the most difficult type of failure to recover?

Johnson: Hard drives that were in a hurricane or flood oftentimes are the most challenging, time- consuming and costly. The drives are generally sealed, but they definitely get wet inside. Just as bad is when they get out the water [and] they get dry again. We're still getting drives from people recovering from the hurricane. These are drives that have had buildings fall on top of them, have been submerged in mud. Users say, "I'll dry it out and fire it up." When that happens, the heads get affixed to the platters.

What advice can you give an IT manager who is facing potential data loss?

Johnson: Too often, we see them assuming that all is lost and that there is no hope. Oftentimes, we're contacted at the executive level because IT has said all is lost. In the vast majority of cases, we can help, if not 100 percent, to some degree.

Burmeister: When you do decide to look for a supplier, do your homework and ask a lot of questions. Make sure you ask for a quote for the worst case and best case in writing before you send it anywhere.