The Basics of Backup and Disaster Recovery
“Disaster” is a word that brings up images of typhoons, earthquakes, and other major natural phenomenon. In the IT world though, disasters aren’t limited to acts of god. An IT disaster includes any technology failure that keeps a company from running and its employees from working. These failures are not ones that affect a single computer or user; they are failures that affect an entire company’s network or data center, or enough of either to have a significant business impact. IT disasters can include a website going down (think Amazon), a hospital’s electronic-medical-records (EMR) solution being inaccessible, or a bank’s online-banking application being unavailable.
According to Continuity Central, the most common causes leading to an IT disaster are:[i]
- Hardware failure
- Human error
- Software failure
When not planned for, these failures can be just as devastating and disruptive as major natural disasters for the company affected.
Downtime: The Other Side of a Data-Center-Level or Network-Level IT Disaster
Downtime is the aftermath of an IT disaster—the debris left behind. The disaster itself is just the catalyst. I don’t bleed, if I’m not cut. The disaster is the cut. Downtime is the bleeding. And while downtime might not sound as frightening as a disaster, it can carry a high cost all the same.
According to a 2016 IHS study, downtime costs for midsize companies (those with 100 to 1,000 employees and annual revenues averaging $100 million) average $1 million per event.[ii]
The ideal solution to the cost of downtime is, of course, prevention. We’ve all heard about five-nines (99.999 percent) availability (uptime), which describes downtime’s counter measure—keeping systems up. Numerous solutions and tactics are used to maintain high availability, including replicating data on different servers (known as backup servers), alternative power supplies, monitoring, security measures, and much more. Even with the best prevention measures in place though, failures can happen. And when prevention fails, the next step is recovery. I didn’t avoid getting cut, so now I have to heal.
Backup and Recovery: Minimizing Downtime After a Disaster
The common terms surrounding healing from IT disasters and the resulting downtime are backup and recovery (also known as disaster recovery).
Backup is a pretty straightforward concept, although the realities and nuances of network and data-center backups can be quite complex. Backup involves just that, backing up data, often in a separate location. If an IT disaster occurs, a company will likely lose data. But what is backed up, is not lost; with a backup, data can be restored.
Backup sounds simple, but the reality of it can be complex. There are different types of backups—from full to partial to differential—in addition to different storage solutions that can be used, such as tape, disk, flash, and the cloud. Backups can also be automated or done manually. The scope of types of backups and storage solutions is more than can be covered here.
Recovery, like backup, is just what it sounds like: recovering from a disaster by restoring systems to working order, preferably the state they were in at the time the disaster occurred. In the case of lost data, recovery includes restoring data from the backups. Full recovery is not always possible using a backup, depending on the failure, but backups are the most common starting point.
Recovery, like backup, can be complex. Multiple factors play into the time and effort needed for recovery, including:
- The backup and recovery solution used
- The scope of the disaster’s impact
- The quality of backups and the time of the last backup
- The size of the failed system
- The amount of data involved
- The existence, or lack of the existence, of a disaster-recovery plan
- The point in time from which you need to recover, known as recovery point objective (RPO)
- What your target timeframe and state at which you want to be back up and running are, known as recovery time objective (RTO)
RPOs are stated backward in time from the time the failure occurred, and can be stated in seconds, minutes, hours, or days. If a company opts for an RPO of one minute, for example, systems would have to be backed up every minute in order to recover the needed data after a disaster.
So Many Considerations; So Many Disaster-Recovery Solutions
Backup and disaster recovery solutions span appliances, software, disaster-recovery-as-a-service (DRaaS) solutions, and more. Some solutions are a feature of a specific product while others are stand-alone products that offer comprehensive backup and recovery for multiple other products or entire environments. The sheer number of available solutions can be mind boggling.
Regardless of your needs, experts suggest starting with a disaster recovery plan (DRP). You can find free templates online to get started. Creating a plan will help you document your needs, which will then help you start to wade through all the available options and determine what solution might be best for your organization.
[ii] IHS Inc. “Businesses Losing $700 Billion a Year to IT Downtime, Says IHS.” January 2016. http://press.ihs.com/press-release/technology/businesses-losing-700-billion-year-it-downtime-says-ihs.