The heat is on to backup

Von Ee Sze

Increasingly, organizations find themselves having to archive data in order to meet regulatory requirements and to avoid legal exposure. In Singapore, for example, MAS requires banks to keep records of financial information for seven years.

There has also been growing awareness of the need to archive email following a landmark case in March, in which the High Court ruled that agreements made by email for property transactions can be legally binding.

The case involved a deal in which the Singapore unit of German logistics firm Schenker was to lease a warehouse for two years from SM Integrated. The court upheld a lease agreement made via email and awarded S$500,000 (US$303,000) in damages based on loss of rental to the warehouse landlord.

A perfect storm of litigation requiring electronic data discovery, regulations governing data retention, disappearing backup windows (thanks to enormous data growth and nonstop business operations) and large-scale catastrophes has catapulted backup and recovery to IT?s head table.

And, reflecting its newfound status, backup and recovery is taking on a more sophisticated, grown-up name: data protection, which encompasses backup, recovery, archiving, retrieval, disaster recovery and business continuity. ?This is a phenomenal time for storage, and particularly for data protection,? says Arun Taneja, president of Taneja Group.

According to IDC, the backup, archiving and replication software market will grow from US$4.3 billion in 2003 to $6.58 billion by 2008, representing 54 per cent of storage software expenditures.

While the term ?data protection? covers a lot of ground, it?s the first four areas ? backup, recovery, archiving and retrieval ? that are currently of highest interest, says Pete Gerr, senior analyst at Enterprise Strategy Group.

Companies now realize they must be able to recover specific pieces of data from financial records, email, instant messaging logs and the like if it?s subpoenaed as evidence in a legal case, Gerr says.

The bottom line is backup, restoration and safe archiving of electronic data can no longer be a ?hope it works? proposition. And this is because very often, it does not work. Forrester Research has said that 30 percent of all data recovery instances that fail are due to botched backups.

Jon Murray, regional program manager, EMC South, says botched backups occur typically because of the enormous and continuing growth of information in the production environment, not matching service level to the right tier of storage or using multiple backup servers streaming to multiple tape drives.

?Most backups simply stop in the middle of the backup cycle and businesses are left unaware that they do not have a full and complete copy of production data,? he says.

Most failed backups are due to human errors, not mechanical errors, says Jim Simon, director of Marketing, Asia Pacific, Quantum. He recommends the use of an automated backup system such as an autoloader or tape library working in conjunction with backup software like Veritas Backup Exec. ?Automating the backup eliminates the need for a human to run the backup every night as well as reducing or eliminating the need to physically swap tapes during the backup process.?

Narayan M, senior consultant, Brocade Communications, also suggests that one way to minimize backup errors is to centralize backups so that there is more control of the backup process. Secondly, there is a need to check that the backups have been successfully completed and periodically tested to ensure that the backed up data can be restored as additional safeguards.

Indeed, a complete data protection strategy should include verifying that the data can be restored, says Edward Pearson, Tape Storage Solutions engineer at Exabyte. ?Strategies should include checks on equipment reliability and restore performance through actual restore operations.?

In the case of archived data, retrieval is a major pain point for many enterprises today, says Murray of EMC. He highlights a common example of companies having to recover an entire email server just to extract one email required urgently by a chief executive officer.

?In archiving, there must be a mechanism for an organization to search through massive amounts of archives, which could exist in various different media (disk or tape), filed according to the value of the information,? says Yeong Chee Wai, manager, Pre-sales Consulting, Symantec Singapore.

Archives are built to maintain and protect fixed content for long periods of time and retrieved for specific business usage, Murray points out. They are quite different from backups, which are generally short-term.

?Understanding the difference between archive and backup allows users to fit the best storage technology for each purpose maximizing reliability and cost in their data protection strategy,? says Pearson.

An archive is a complete set of data from a certain point in time, he explains. ?An archive is meant to preserve data in the event it must be referenced at a later time.? In his view, a reliable and inexpensive storage media would be a good choice for this application.

A backup, on the other hand, is a rolling updated copy of data that allows a restore of the most popular version in the case of data corruption. Implementing a backup copy of data on nearline storage enables quick recovery from an unplanned event and would be a good choice to keep data highly available, says Pearson.

Murray emphasizes that backup and archiving are designed to deliver against very specific and very distinct requirements. As such, many large organizations actually keep backup and archive separate as they are used for different purposes. He provides three simple rules of thumb to differentiate between backup and archive:

1) Backups are for recovery; archives are for retrieval.

2) Backups are short-term; archives are long-term.

3) Backups aren?t good for compliance; archives are.

Simon of Quantum uses a banking scenario to illustrate the differences. Each day the bank should backup its data in case of a physical catastrophe such as a fire, or a virus attack, he says. At the same time, the bank should save an archive copy of the data for an indefinite period so that it will be possible for the bank to provide customers with a copy of their account history, even years later.

According to Simon, companies turn to tape for backup and archiving because it is removable and can be stored off-site. Other factors going for tape are that it has a long shelf life of 30 years, offers the lowest cost per gigabyte of any medium (for example, a Quantum LTO-3 cartridge has a cost per GB of US$8), and it is easy to clone, so multiple copies can be kept at multiple locations for added protection.

In recent years, however, disk has been encroaching onto this space, with disk manufacturers increasing disk capacities and pushing down the cost per MB of disk storage. ?For the first time, companies have a cost-effective alternative to tape for their enterprise storage needs,? says Robert Yang, Seagate?s senior director and general manager, Channel Sales and Marketing, Asia Pacific. ?They can now choose to backup or archive their information on online disk storage or nearline storage versus on tape. This enables them to still continue their business while gaining access to their information in a timely manner due to faster retrieval times,? he says.

He gives the example of Seagate?s newly-launched External Hard Drive, with a new FireWire 800 interface which offers faster access to backup or archival information. Having the information on disk enables companies to retrieve it rapidly and prevent any significant downtime or gap in the customer experience or business operations, says Yang.

With disk-to-disk devices and virtual tape libraries, backups can run within reasonable time frames, and more data can be kept online, which enables faster recoveries.

?Disk storage is being used either as an exclusive method of backup or as an intermediate or staging area before going to tape,? says Bill North, director of Research for IDC?s Storage Software Service.

For now, however, tape is still the least expensive means of long-term archival. Media costs aside, regulatory requirements that data be stored on media that cannot be erased eliminate many disk-based storage systems, with the exception of EMC?s Centera.

And unlike tape, which can be transported off-site, storing data on disks at a remote site involves replication over costly bandwidth, assuming that it is available in the first place.

The major challenge you have to overcome here is the bandwidth availability in different countries and locations, says Murray.

Simon points out that data continues to grow at a conservative 50 per cent per year. ?Remote backups require an enormous bandwidth. Not only are there technical challenges to overcome, there are expense challenges,? he says.

Still, remote backup may be a viable option or even a desired capability for some users. As Pearson of Exabyte says, users should consider their needs for archival and nearline backup. ?Matching the right technology to an application is a great way for users to minimize technology disadvantages, increase reliability/performance and save overall cost,? he says.

One reason why an organization may want to do remote backups is so that it can consolidate data from branch offices to a centralized backup location. ?Here the advantages are obvious. By consolidating data in the remote office from low-end servers and storage onto, for example, a Microsoft Windows Storage Server-based device, you create a single data storage ?bunker? at the remote site. You have reduced costs, simplified remote management and have a scalable storage system that can grow with you,? says Murray. ?When remote office data is replicated and consolidated to a central location, it can be backed up with trained personnel while leveraging equipment and software already established and proven in the data center.?

Meanwhile, improvements in compression technologies and WAN acceleration devices are removing some of the issues surrounding performance and bandwidth.

Brocade, for example, has introduced its new Tapestry Wide Area Files Services (WAFS) that centralizes branch data onto a single backup infrastructure. Using storage caching technologies and WAN optimization, WAFS allows users to save files to a central repository with considerable performance improvements over the wide area network, says Narayan.

According to Narayan, there are two ways to do remote backups or replications ? synchronous and asynchronous. Synchronous replications need fast and high throughput media between the primary and secondary data centers. An example of this is using dark fiber between the data centers.

While this is a very fast medium, dark fibers tend to be expensive, he says.

Asynchronous replications can be achieved by fiber channel tunnelling over IP (FCIP) in a WAN. Although this is much slower than dark fiber, this is much less expensive and takes advantage of an existing WAN infrastructure to go greater distances, Narayan adds.

Other options for remote backup are to either implement an isolated or remotely-administered backup system at the remote location, or to use replication technology to replicate data backup to the main location and backup from the main location, says Yeong of Symantec.

The first option does not utilize much network bandwidth, but requires more administrative attention at the remote location for tasks like media management and backup system maintenance, he says.

The second option uses more network bandwidth, but has the advantage of requiring virtually no administrative effort at the remote location. The size of the network bandwidth required depends on the rate of data change at the remote location.

?The decision between the two options lies in the amount of data and rate of data change at the remote site,? says Yeong.

As the technical and cost issues are ironed out, remote backups will play a growing role in organizations? disaster recovery strategy.

As Murray explains, ?The data center can deliver rapid business resumption if a remote location goes down ? a second copy of data is immediately available. And the approach is one that can greatly reduce the costs of hardware, software, support and administration for all remote location. Since copies of the branch office data are now stored off-site at the data center, this data is now protected from disasters such as fire, flood or theft that may take place in the remote location.?

(Additional reporting by Mary Brandel)