Skip to main content

Disaster Recovery and Continuity Planning

Types of Disasters

Natural Disasters

  • Common types of natural disasters that may threaten an organization:
    • Earthquakes
    • Floods
    • Storms
    • Tsunamis
    • Volcanic eruptions
  • Exposure depends on an organization's physical location.

Man-Made Disasters

  • Types include:
    • Expulsions
    • Electrical fires
    • Terrorist acts
    • Power outages/utility failures

Recovery Sites

Site Cost Cost Effort
Hot
  • Proactive site with servers and a live backup for immediate disaster response.
  • Replicates production environment.
  • Essential for mission-critical sites.
High Low
Warm
  • Allows pre-installation of hardware and bandwidth pre-configuration.
  • Requires software and data load post-disaster.
Medium Medium
Cold
  • A ready data center space and network.
  • Requires hardware movement and setup post-disaster.
Low High

Hot sites offer immediate recovery for mission-critical operations.

Service Bureau

  • Companies that lease computer time.
  • Owns large server farms and numerous workstations.
  • Can be onsite or remote.

Mobile Site

  • Alternative to traditional recovery sites.
  • Often self-contained units (e.g., trailers) that can be easily moved.

Multiple Sites

  • Offers combinations of the recovery sites.
  • Can vary depending on specific use cases.

Recovery Objectives

  • RPO (Recovery Point Objective): Age of files that must be recovered from backup storage for operations to resume if a system or network goes down.

  • RTO (Recovery Time Objective): Time duration and service level required to restore a business process post-disaster to avoid unacceptable consequences.

RPO focuses on data age, RTO focuses on recovery duration.

Mutual Assistance Agreements (MAAs)

  • Pros:

    • Inexpensive alternative to disaster recovery sites.
    • Mutual aid during disasters.
    • Often seen between government agencies or commercial entities.
  • Cons/Risks:

    • Both parties may be affected by the same disaster.
    • Raises confidentiality concerns.
    • Difficult to enforce.

 Uncommon due to enforceability and shared disaster risks.


Business Continuity Planning (BCP)

  • Recovery Team - Used to get critical business functions running at the alternative site.
  • Salvage Team (Restore) - Used to return the primary site to normal processing conditions.

  • Project Scope and Planning
  • Business Impact Assessment
  • Continuity Planning
  • Approval and Implementation

The goal of BCP is to provide an efficient response to enhance a company's ability to recover from a disruptive event.

BCP Definitions

📌 BCP - Overall organizational plan for how to continue business.

📌 COOP (Continuity of Operations plan) - Plan for continuing business until the IT infrastructure can be restored.

📌 DRP (Disaster Recovery Plan) - Plan for recovering from a disaster and having the IT infrastructure back in operation.

📌 BRP (Business Resumption Plan) - Plan to move the disaster recovery site back to your business environment or back to normal operations.

Key Terms

  • MTBF (Mean Time Between Failures) - A time determination for how long a piece of IT infrastructure will continue to work before it fails.
  • MTTR (Mean Time To Repair) - Determination for how long it will take a piece of hardware or software to be repaired and back online.
  • MTD (Maximum Tolerable Downtime) - The amount of time we can be without an asset that is unavailable before we must declare and initiate the DRP.

Core Goals of DR and BCP

  1. Minimizing the effects of a disaster.
  2. Improving responsiveness by employees in different situations.
  3. Easing confusion by providing written procedures and participation in drills.
  4. Helping make logical decisions during a crisis.

Types of Disaster Recovery Plan Tests

Test Type Description
Read-Through Distribute copies of disaster recovery plans to the DRP team for review.
Structured Walk-Through Members of the DRP team role-play a disaster scenario in a large conference room. Known to test moderator.
Simulation Test Similar to structured walk-through but some response measures are tested on the business's infrastructure.
Parallel Test Relocating personnel to the alternative recovery site and implementing site activation procedures.
Full Interruption Test Like parallel tests but involves shutting down the primary sites and shifting operations to the recovery site.