Retail Pro - Serious Software for Serious Retailers

Disaster Recovery

Data Protection/Disaster Recovery – Planning for the Unthinkable (Part I)

Posted in Announcements, Data Protection, Disaster Recovery on July 20th, 2010 by Larry Wynia – Be the first to comment

Equipment failure, data corruption, viruses, security threats, and even natural disasters all represent potential threats, or risks, to one of a retailers most valuable assets… their data.

One of the most important exercises any retailer, small or large, can perform is to develop and maintain a comprehensive Disaster Recovery Plan. Since it does no good to begin planning after a disaster has occurred,  it’s essential that you give this exercise a high priority, and that all of your processes and procedures are in place prior to any need for data, systems, or facilities recovery.

While a comprehensive discussion about Disaster Recovery is beyond the scope of this blog (volumes have been written on the subject), we can discuss some of the most critical elements that a small retailer, without the resources of an IT staff, can do right now to avoid data loss and disaster.

Obviously, your data is vital to your Retail business. Your Inventory, customer lists, accounting information, applications, and most everything you need to run your business is all stored as data on your server and workstation hard drives.  How important is this data to you? If you answered, “very important”, then at a minimum you should have:

  • RAID configured disks
  • Daily backups to tape or other media
  • Offsite Storage for backups
  • Recovery Testing and validation

RAID Configured Disks

RAID is a technology that involves the use of multiple hard drives configured in an array, that appear as one large single drive in order to improve disk performance, increase storage capacity, and improve fault tolerance and server availability by allowing for a single hard drive failure (sometimes more) without data loss or the need to recover data from backup. With the use of hot-standby and hot-swap technologies, many hard drive failures can be recovered from completely without any server downtime.

Various RAID technologies have been developed for performance improvements and fault tolerance, including RAID1, RAID5, RAID10, RAID50, RAID0+1, and others. RAID can provide low-to-moderate cost, real time data redundancy, and represents an essential step in providing fault tolerance for your servers and data. Two relatively low cost RAID implementations that provide a good degree of data redundancy, fault tolerance, and server availability are RAID1 and RAID5.

RAID1 (data mirroring): data is written identically to two or more disks. The array provides fault tolerance and server availability as long as any one of the disks in the array remains operational.

RAID5 (data striping with parity): data is divided up into blocks and the blocks are written across all drives simultaneously (striped) for faster performance. A parity stripe is calculated and written to the disks allowing for the recovery of data should any one disk in the array fail.

There are many other RAID solutions available that provide additional benefits at increased cost, but if your business’s critical data is not currently residing on a minimum of a RAID1 or RAID5 storage subsystem, you need to take immediate action to ensure that it does.

Daily Backups to Tape or Other Media

Even though your data may be sitting on a RAID storage subsystem, you need to take data protection and disaster/recovery planning to the next level by ensuring that your data is backed up to tape, another disk, optical media (CD, DVD, Blu-ray), copied to the cloud, etc. Although RAID provides fault tolerance against hard drive failure it cannot protect against deleted files, a corrupt database, a computer virus, or a natural disaster that destroys hardware and the data that resides on it.

A tape backup strategy that backups the entire disk contents, including the operating system, applications, and application data is critical to the recoverability of any business’s operations in the event of disaster.  This should allow for the recovery of the entire server(s).

Another approach to recovering an entire server is to ensure you retain your operating system installation disks, retain access to all your application installation files, retain all installation key codes, and perform regular, scheduled backups of your application data. In the event a server needs to be rebuilt, you can install the operating system, applications, and restore the data. However, this approach requires longer recovery time and significantly more work.

Yet another approach to backing up a server is to periodically create images of your servers using one of the many different ‘disk imaging’ software solutions on the market, and conducting regularly scheduled backups of your application data.

If your application has a built-in backup utility, use it for backing up the application data. Not all data is recoverable simply because you’ve backed up data files. Database applications utilizing relational databases store their information in a series of data files and transaction logs. The application’s backup utility will commit transaction log files to the database prior to creating the backup, ensuring the integrity of the backup. You will need to utilize a backup utility that came with the application, or a backup utility provided by a 3rd party provider, to properly backup and restore relational database application data.

Offsite Storage for Backups

To effectively guard against any disaster, you should not store your backup tapes in the same facility where your computer systems reside. Loss of the facility, or loss of access to the facility, may result in 100% data loss. There are many commercial solutions available for storing your backup media. Things to consider are:

Proximity to Business Location: keep the data close to allow for quick delivery, but not so close that the storage facility can be part of the same natural disaster.

Physical Security:  any 3rd party data storage provider should have a high degree of control over who enters and has access to their facilities.

Security during Transit: any 3rd party provider should have measures for ensuring the security of the data while in transit.

Password protection and encryption technologies can also be implemented to ensure data integrity while the backup media is offsite.

Recovery Testing and Validation

Testing backup recovery is the only way to ensure that backups are working and will be available to you should you need them. Procedures for recovery of systems and application data need to be tested and documented to ensure they work properly and to provide the training necessary to perform recovery when needed.

Regular review and validation that backups are successful also needs to occur. It’s not sufficient to assign the task of swapping tapes to an employee. The Backup utility should be reviewed regularly to ensure that backup jobs are being completed and that the data actually resides on the backup media. More than once, I have witnessed businesses that have assigned the duty of swapping tapes to an employee without regular, scheduled reviews that the backup jobs were actually running properly, only to find out that there was no data available for recovery when it was needed.

Summary

Every business, large or small, is dependent upon their servers and application data for continued operations. Large companies have IT staffing to ensure the integrity and availability of their data. Small business owners must take on this added responsibility themselves or through the use of outside vendors. Either way, it is essential that the disaster recovery plan provides for fault tolerance, data redundancy, and has a solid backup strategy with proven recoverability and secure off-site storage of critical data.

In future blog posts, I will dig deeper into various Disaster Recovery concepts.