Demystifying de-duplication

22.02.2007
Data de-duplication technology has emerged as a key technology in the effort to reduce the amount of data backed up on a daily basis, which in many enterprises is growing at more than 100 percent every year.

For example, John Thomas, IT manager at Atlanta-based law firm Troutman Sanders LLP, was able to use data de-duplication technology to reduce the amount of data streamed from more than a dozen remote offices and thereby cut his backup window from 11 hours to 50 minutes. Thomas says his compression ratio for his backups run as high as 55:1.

Vendors have taken different approaches to the technology resulting in multiple distinct products that users should become familiar with in order to choose the flavor that best suits their environments.

Data de-duplication uses commonality factoring to reduce the amount of data either at the backup server or at the target storage device. As a result of enormous compression ratios achieved by data de-duplication technology, disk is becoming more attractive as a viable, online alternative to traditional tape-based backup. For example, people working at remote and branch offices need instant access to all the data and applications available at their company's headquarters. So IT shops typically set up remote mini data centers, with application servers, block and file data storage, backup tape and report printers, sacrificing administrative control. By utilizing data de-dupe technology, backups can be performed over the WAN using spare nighttime bandwidth, eliminating the need for tape at remote sites.

Greg Schulz, senior analyst at The StorageIO Group, says de-duplication technology mainly resides in the backup space, complementing traditional tape libraries with the purpose of lowering costs and reducing data.

The main benefit to de-dupe technology is that you're not seeing your virtual tape library fill up, and you're "not seeing your backup targets fill up as fast as it normally would," Schulz says.