When disaster strikes, recovering the data center is essential. But even more important is getting all the critical business functions back in operation ASAP. Here's a checklist and template for all the steps you need to go through to create a comprehensive disaster recovery plan -- before trouble takes you down.
This article is being re-published due to its timeliness.
In 1992 Hurricane Andrew put 39 major data centers out of commission. And in 1993, the World Trade Center bombing caused 21 data centers to shut down. While you don't like to think about it, every organization, regardless of its size, runs the risk of a major systems outage, such as a tornado demolishing a data center or a building fire destroying the facility and everything in it. A study by the University of Texas found that 85 percent of businesses depend totally or heavily on information technology systems to stay in business, and that a loss of those systems would cost businesses up to 40 percent of their daily revenues.
Disaster can strike at any time. In fact, there are more than 35 types of disasters, ranging from the most common, such as power outages, to the most catastrophic, such as earthquakes. In essence, a disaster includes any type of interruption of service that results from some force beyond the organization's control. Disaster recovery provides systematic procedures for how to react to and how to recover from that ominous external or internal force. Disaster recovery planning, which complements business continuity and contingency planning, ensures the ability of the organization to function effectively if an unforeseen event severely disrupted normal operations.
The following checklist will help the key individuals in your organization to go through the thought process for preparing a disaster recovery plan. The objective is to restore all critical business functions, rather than such disparate functions as only the data center.
Organize the Project
A successful initiative of this magnitude requires support from senior management associated with the organization, a dedicated disaster recovery team whose members have knowledge of critical business systems, and a well thought out planning strategy and testing strategy.
Senior executives responsible for disaster recovery planning will perform the first two steps. The disaster recovery coordinator, working with the appropriate team leaders, should perform steps 3 to 7.
- Determine which senior executive(s) will have overall responsibility for disaster recovery.
- Have this executive appoint disaster recovery coordinator.
- Appoint a disaster recovery team leader for each operational unit, such as server backup or telephone system.
- Convene disaster recovery planning team and sub-teams as appropriate.
- Working with senior executives responsible for disaster recovery, the disaster recovery coordinator should identify the following:
Set project timetable and draft project plan, including assignment of task responsibilities.
Obtain senior management's approval for scope, assumptions, and project plan.
- Scope -- the areas to be covered by the disaster recovery plan
- Objectives -- what is worked towards and what is the course of action that the disaster recovery team intends to follow
- Assumptions -- what is being taken for granted or accepted as true without proof?
Conduct Business Impact Analysis
The disaster recovery planning team should perform this step to identify which business departments, functions, or systems are most vulnerable to potential threats, what are the potential types of threat, and what effect would each identified potential threat have on each of the vulnerable areas within the organization.
- Identify functions, processes, and systems.
- Interview information systems support personnel.
- Interview business unit personnel.
- Analyze results to determine critical systems, applications, and business processes.
- Prepare impact analysis on interruption on critical systems.
Conduct Risk Assessment
The disaster recovery planning team should work with the organization's technical and security person to determine the probability of each functional business units' critical systems becoming severely disrupted and to document the amount of acceptable risk the business unit can tolerate. For each critical system, provide the following information:
- Review physical security, i.e. secure office, building access off hours, etc.
- Review backup systems and data security.
- Review policies on personnel termination and transfer.
- Identify systems supporting mission critical functions.
- Identify vulnerabilities, such as physical attacks, or acts of God, such as floods.
- Assess probability of system failure or disruption.
- Prepare risk and security analysis.
Develop Strategic Outline for Recovery
The steps outlined here provide all of the components necessary to perform a recovery. These steps will help pull together information about the operations of all systems, especially those owned or managed by non-technical managers with help from technical support personnel. Steps one through four mainly apply to functional business units that manage technology systems to process critical functions. The disaster planning recovery team and the functional business unit may wish to appoint other appropriate individuals to perform subsequent tasks.
- Assemble groups as appropriate for the following:
For each system/process above quantify the following processing requirements.
- Hardware and operating systems
- Other critical functions and business processes as identified in
- the Business Impact Analysis step.
Detail all the steps in your workflow for each critical business functions. (For example, for payroll processing include each step that must be complete and the order in which to complete them.
Identify systems and applications.
- Light, normal, and heavy processing days
- Transaction volumes
- Dollar volume, if any
- Estimated process time
- Allowable delays (days, hours, minutes, etc.)
Identify all vital records.
- Component name and technical identification if any
- Type (online, batch process, script)
- Run time
- Allowable delay (days, hours, minutes, etc.)
Identify if a severe disruption occurred what would be the minimum
requirements or replacement of the critical function during the
- Name and description
- Type (backup, original, master, history)
- Where are they stored?
- Source of item or record
- Can the record be easily replaced by another source?
- Backup and backup generation frequency
- Number of backup generations available onsite and off-site
- Location of backups
- Media key, retention period, rotation cycle
- Who is authorized to retrieve the backups?
Identify if alternative methods of process either exist or could be developed, quantifying on processing (include manual processes).
Identify person(s) who support the system or the application.
Identify primary person to contact if system or application cannot function as normal.
Identify secondary person to contract if system or application cannot function as normal.
Identify all vendors associated with the system or application.
Document business unit strategy during recovery (conceptually how will the unit function?).
Quantify resources required for recovery by time frame.
Develop and document recovery strategy, including priorities for recovering system/function components, and recovery schedule.
- Type (server hardware, software, research materials, etc.
- Item name and description
- Quantity required
- Location of inventory, alternative, or off-site storage
This article was originally published on Thursday Jul 12th 2001
Review On-site and Off-Site Backup and Recovery Procedures
The disaster recovery planning team should perform this task to provide for a current backup of critical program and data that can be used in the even of a disaster. To this end, the disaster recovery planning time can reduce downtime and speed recovery.
- Review current records (operating systems, code).
- Review current off-site storage facility or arrange for one.
- Review backup and off-site backup storage policy or create one.
- Present to functional business unit leader for approval.
Select Alternate Facility
The disaster recovery should perform the task of looking for a location, other than the normal facility, used to process data and or conduct business, in the event of a disaster.
- Determine resource requirements.
- Assess platform uniqueness of unit systems (Macintosh, IBM, Oracle, etc.).
- Identify alternative facilities.
- Review cost/benefit.
- Evaluate and make recommendation.
- Present to business unit leader for approval.
- Make selection.
In part 2, we will cover Plan Development, Testing, and Ongoing Maintenance for your disaster recover plan.