Disaster Recovery & Business Continuity Guide

📋 In This Guide

1. RTO & RPO — Defining Recovery Targets
2. DR Tier Strategy
3. Backup Architecture & 3-2-1 Rule
4. AWS DR Setup
5. Azure Site Recovery
6. Database Replication & Recovery
7. DR Testing & Validation
8. Business Continuity Plan (BCP)

Disasters are not exceptional events — they are inevitable. Power failures, ransomware attacks, hardware failures, datacenter fires, ISP outages, and accidental data deletion happen to every organization eventually. The question is not whether a disaster will occur but whether your organization can recover from it quickly enough to survive the business impact.

Disaster Recovery (DR) is the technical capability to restore IT systems after a disruptive event. Business Continuity Planning (BCP) is the broader program ensuring the organization can continue operating — even in degraded mode — while recovery is underway. Both are required. This guide walks through building both from scratch.

1 RTO & RPO — Defining Recovery Targets

Before designing any DR solution, you must define what "recovered" means for each system — how quickly must it be restored (RTO) and how much data loss is acceptable (RPO). These targets drive every architectural decision and cost implication in the DR program.

RTO

⏱️

Recovery Time Objective

The maximum acceptable time between a disaster occurring and IT systems being fully restored and available to users.

Example: ERP system RTO = 4 hours means the ERP must be back online within 4 hours of a failure — at any time, day or night.

RPO

💾

Recovery Point Objective

The maximum acceptable amount of data loss measured in time — how far back in time can you afford to restore from?

Example: Database RPO = 1 hour means you can afford to lose at most 1 hour of transactions. Backups must run at least every 1 hour.

Business Impact Analysis (BIA) — Classify Every System

System	Business Impact of Outage	Target RTO	Target RPO	DR Tier
Payment / POS systems	Revenue stops immediately — every minute = lost sales	< 15 min	< 5 min	Tier 1 — Hot Standby
Customer-facing website	Brand damage, lost leads, revenue impact	< 30 min	< 15 min	Tier 1 — Hot Standby
ERP / Core business app	Operations halt — orders, invoicing, procurement stop	< 4 hours	< 1 hour	Tier 2 — Warm Standby
Email (Exchange / M365)	Communication disrupted — moderate operational impact	< 4 hours	< 1 hour	Tier 2 — Warm Standby
Internal file servers	Productivity reduced — work continues manually	< 8 hours	< 4 hours	Tier 2 — Warm Standby
Dev / Test environments	Minimal — development paused temporarily	< 24 hours	< 24 hours	Tier 3 — Cold Standby
Archive / Reporting systems	Low — reports delayed, no operational impact	< 72 hours	< 24 hours	Tier 3 — Cold Standby

✅ Pro Tip: Conduct the Business Impact Analysis as a joint exercise with business department heads — not just IT. IT cannot accurately assess the business cost of a 4-hour ERP outage without input from operations and finance. In practice, business leaders consistently assign tighter RTO/RPO requirements than IT expected — and are equally surprised by the cost implications. The BIA conversation aligns expectations and justifies DR investment with quantified business risk, making it far easier to get budget approved.

2 DR Tier Strategy

DR solutions exist on a cost-vs-recovery-speed spectrum. The tighter your RTO/RPO requirement, the more expensive the DR solution. Matching each system to the appropriate DR tier is the most important cost optimization in DR design.

🔴 Tier 1 — Hot Standby (Active-Active / Active-Passive)

Highest Cost

RTO: < 15 minutes | RPO: Near-zero (seconds to minutes)
A fully provisioned, continuously synchronized duplicate environment running in parallel. Failover is automatic or requires minimal manual steps. Data replication is synchronous (primary waits for DR confirmation) or near-synchronous (async with <60 second lag). Examples: Multi-AZ RDS deployment, AWS Route53 health-check failover, Azure Traffic Manager active-active, on-premises database Always On Availability Groups. Cost: 1.8–2× the primary environment cost.

🟡 Tier 2 — Warm Standby (Pilot Light + Scale-Up)

Medium Cost

RTO: 1–4 hours | RPO: 15 minutes – 1 hour
A minimal DR environment running at reduced capacity (e.g., t3.micro instances with data replicated) that can be scaled up to production size during a disaster. The "pilot light" variant keeps only the database and core services running — compute is provisioned on-demand during failover. AWS: Pilot Light with AMI snapshots + RDS read replica promotion. Azure: Azure Site Recovery with pre-staged VMs. Cost: 30–50% of primary environment cost.

🟢 Tier 3 — Cold Standby (Backup & Restore)

Lowest Cost

RTO: 4–24 hours | RPO: 4–24 hours
No standing DR infrastructure — systems are rebuilt from backup during a disaster. Backups are stored in S3, Azure Blob, or an offsite storage provider and restored to new infrastructure when needed. Acceptable for non-critical systems where 24-hour recovery is acceptable. AWS: EC2 AMI snapshots + RDS automated backups. Azure: Azure Backup vault. Cost: Storage cost only — typically 5–10% of primary environment cost.

3 Backup Architecture & 3-2-1 Rule

Backups are the foundation of any DR program — but backups that have never been tested are not backups, they are hopes. A structured backup architecture with the 3-2-1 rule and regular restore testing is the minimum viable DR for every organization.

The 3-2-1 Backup Rule

Copies of Data

The original production data plus two additional backup copies — never rely on a single copy

Different Media Types

Store backups on at least two different media — e.g., local NAS + cloud storage. Protects against media failure

Offsite Copy

At least one copy stored offsite or in a different cloud region — protects against site-wide disasters and ransomware

Extended 3-2-1-1-0 Rule (Modern Standard)

+1 Immutable copy: One backup copy must be immutable (cannot be modified or deleted for a defined period) — AWS S3 Object Lock, Azure Blob Immutable Storage, or air-gapped tape. This is the ransomware-proof copy — attackers who compromise your backup server cannot delete immutable backups
+0 Errors: Zero backup errors — every backup job must be verified. Automated restore testing must confirm recoverability. A backup with errors is not a backup

Backup Schedule Design

Backup Type	Frequency	Retention	Method	Verify
Continuous / Transaction log	Every 15–60 min	24–48 hours	DB log shipping, AWS DMS CDC	Auto (replication lag check)
Hourly incremental	Every hour	7 days	Veeam, Azure Backup, AWS Backup	Daily automated restore test
Daily full	Every 24 hours (2 AM)	30 days	Full VM snapshot or DB dump	Weekly manual restore test
Weekly full	Every Sunday	90 days	Full backup to separate vault	Monthly full restore drill
Monthly archive	1st of each month	1–7 years	Cold storage (Glacier / Archive)	Annual restore test

AWS Backup — Centralized Backup Policy

# AWS Backup — create organization-wide backup plan
aws backup create-backup-plan --backup-plan '{
  "BackupPlanName": "EnterWeb-DR-Backup-Plan",
  "Rules": [
    {
      "RuleName": "Hourly-7day-retention",
      "TargetBackupVaultName": "EnterWeb-Primary-Vault",
      "ScheduleExpression": "cron(0 * ? * * *)",
      "StartWindowMinutes": 60,
      "CompletionWindowMinutes": 120,
      "Lifecycle": {
        "DeleteAfterDays": 7
      },
      "CopyActions": [
        {
          "DestinationBackupVaultArn": "arn:aws:backup:ap-south-2:ACCOUNT:backup-vault:EnterWeb-DR-Vault",
          "Lifecycle": { "DeleteAfterDays": 30 }
        }
      ]
    },
    {
      "RuleName": "Daily-30day-retention",
      "TargetBackupVaultName": "EnterWeb-Primary-Vault",
      "ScheduleExpression": "cron(0 2 ? * * *)",
      "StartWindowMinutes": 60,
      "CompletionWindowMinutes": 480,
      "Lifecycle": {
        "MoveToColdStorageAfterDays": 30,
        "DeleteAfterDays": 90
      }
    }
  ]
}'

# Enable S3 Object Lock on backup vault (immutable backups)
aws backup create-backup-vault \
  --backup-vault-name "EnterWeb-Immutable-Vault" \
  --creator-request-id "enterweb-dr-2026"

# Add resource policy to deny vault deletion
aws backup put-backup-vault-lock-configuration \
  --backup-vault-name "EnterWeb-Immutable-Vault" \
  --min-retention-days 30 \
  --max-retention-days 365

🚨 Critical — Ransomware Protection: Standard cloud backups are NOT ransomware-proof unless you enable immutability. If your AWS or Azure backup account credentials are compromised, attackers can delete all backup recovery points before encrypting your production data — eliminating your ability to recover without paying ransom. Enable AWS Backup Vault Lock or Azure Backup soft-delete with immutability on ALL backup vaults immediately. This single control is the most important ransomware resilience measure for organizations using cloud backup.

4 AWS DR Setup

AWS provides several native services for DR — the right combination depends on your RTO/RPO targets and whether you need Tier 1 (hot), Tier 2 (warm), or Tier 3 (cold) recovery for each workload.

AWS Pilot Light DR Architecture

# ── AWS Pilot Light DR Setup ─────────────────────────────
# Primary region: ap-south-1 (Mumbai)
# DR region:      ap-south-2 (Hyderabad)

# Step 1: RDS Read Replica in DR region (data always current)
aws rds create-db-instance-read-replica \
  --db-instance-identifier "erp-db-dr-hyderabad" \
  --source-db-instance-identifier "erp-db-primary-mumbai" \
  --source-region ap-south-1 \
  --db-instance-class db.t3.medium \
  --availability-zone ap-south-2a \
  --no-publicly-accessible \
  --tags Key=Environment,Value=DR Key=Tier,Value=2

# Step 2: Pre-bake AMIs in DR region (updated weekly)
# Create AMI from production EC2 in Mumbai
PROD_AMI=$(aws ec2 create-image \
  --instance-id i-0abc123def456789 \
  --name "ERP-App-DR-$(date +%Y%m%d)" \
  --no-reboot \
  --query ImageId --output text)

# Copy AMI to Hyderabad DR region
aws ec2 copy-image \
  --source-image-id $PROD_AMI \
  --source-region ap-south-1 \
  --region ap-south-2 \
  --name "ERP-App-DR-$(date +%Y%m%d)-Hyderabad"

# Step 3: Pre-create VPC and subnets in DR region (matches primary)
# Run Terraform or CloudFormation — keep DR network config in IaC
# Never manually configure DR infrastructure — it drifts from primary

# Step 4: Route53 Health Check + Failover DNS
aws route53 create-health-check --caller-reference "erp-prod-$(date +%s)" \
  --health-check-config '{
    "IPAddress": "15.207.x.x",
    "Port": 443,
    "Type": "HTTPS",
    "ResourcePath": "/health",
    "FullyQualifiedDomainName": "erp.enterweb.in",
    "RequestInterval": 10,
    "FailureThreshold": 3
  }'

# Primary DNS record (Failover = PRIMARY)
# Secondary DNS record pointing to DR ALB (Failover = SECONDARY)
# Route53 automatically switches to DR when health check fails 3× in 30 sec

# Step 5: Failover runbook — promote RDS Read Replica to standalone
aws rds promote-read-replica \
  --db-instance-identifier "erp-db-dr-hyderabad" \
  --backup-retention-period 7

# Step 6: Launch EC2 from pre-baked AMI in DR region
aws ec2 run-instances \
  --image-id ami-DR-IMAGE-ID \
  --instance-type m5.xlarge \
  --subnet-id subnet-DR-SUBNET \
  --security-group-ids sg-DR-SG \
  --iam-instance-profile Name=ERP-App-Role \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=ERP-DR-App}]'

✅ Pro Tip: Use AWS CloudFormation or Terraform to define all DR infrastructure as code and store it in a Git repository. During a disaster, spinning up the DR environment is as simple as running one command — terraform apply -var="environment=dr" — which provisions all required EC2 instances, load balancers, security groups, and Route53 records in the DR region in under 10 minutes. Infrastructure as code eliminates the most common DR failure: the DR environment configuration has drifted from production because manual changes were never replicated.

5 Azure Site Recovery

Azure Site Recovery (ASR) is Microsoft's managed DR replication service — it continuously replicates on-premises VMware, Hyper-V, and physical servers (or Azure VMs between regions) to Azure, enabling failover with minimal data loss and automated recovery plan execution.

ASR Setup for On-Premises VMware to Azure

# Azure Site Recovery — On-Premises VMware → Azure
# Prerequisites:
# - Azure Recovery Services Vault in target region
# - Configuration Server (Windows Server 2019) on-premises
# - Network connectivity: On-premises → Azure (VPN or ExpressRoute)

# Step 1: Create Recovery Services Vault
az backup vault create \
  --resource-group EnterWeb-DR-RG \
  --name EnterWeb-ASR-Vault \
  --location centralindia

# Step 2: Download and install Configuration Server on-premises
# Azure Portal → Recovery Services Vault → Site Recovery
# → Prepare Infrastructure → VMware → Download Configuration Server installer
# Install on Windows Server 2019 (16 vCPU, 32GB RAM minimum)
# Register with vault using downloaded credentials file

# Step 3: Install Mobility Service on VMs to protect
# ASR Portal → Replicated Items → Add → Source: On-Premises
# Select VMs to replicate → Install Mobility Agent automatically via push install
# OR deploy via SCCM/Group Policy for large deployments

# Step 4: Configure replication policy
az site-recovery policy create \
  --resource-group EnterWeb-DR-RG \
  --vault-name EnterWeb-ASR-Vault \
  --name "EnterWeb-Replication-Policy" \
  --provider-specific-input '{
    "instanceType": "InMageRcm",
    "recoveryPointHistoryInMinutes": 1440,
    "crashConsistentFrequencyInMinutes": 5,
    "appConsistentFrequencyInMinutes": 60
  }'
# Recovery Point History: 24 hours of recovery points
# Crash consistent: every 5 minutes (RPO = 5 min)
# App consistent: every 60 minutes (VSS snapshot — DB consistency)

# Step 5: Create Recovery Plan (defines failover sequence)
# ASR → Recovery Plans → Create Recovery Plan
# Name: EnterWeb-Full-DR-Plan
# Source: On-Premises | Target: Central India (Azure)
# Order groups:
#   Group 1: AD Domain Controllers (must come up first)
#   Group 2: Database servers (ERP DB, HR DB)
#   Group 3: Application servers (ERP App, HR App)
#   Group 4: Web / Reverse proxy servers

# Pre/Post scripts in recovery plan:
# Before Group 2: Run Azure Automation runbook to configure NSG rules
# After Group 3: Run runbook to update DNS records in Azure Private DNS
# After Group 4: Run runbook to verify application health endpoints

ASR Test Failover — Non-Disruptive DR Test

# Test Failover — isolated network, no production impact
# ASR Portal → Recovery Plans → EnterWeb-Full-DR-Plan
# → Test Failover → Select recovery point → Select test network

# OR via CLI:
az site-recovery recovery-plan start-failover \
  --resource-group EnterWeb-DR-RG \
  --vault-name EnterWeb-ASR-Vault \
  --recovery-plan-name "EnterWeb-Full-DR-Plan" \
  --properties '{"failoverDirection": "PrimaryToRecovery",
                  "skipChangeOfSourceControl": false,
                  "providerSpecificDetails": [{"instanceType": "InMageRcm"}]}'

# After test failover completes:
# 1. Connect to isolated Azure VMs — verify they booted correctly
# 2. Test application functionality end-to-end in isolated environment
# 3. Record: Time to complete failover, any errors, services that needed manual intervention
# 4. Clean up test failover (removes test VMs, no impact on replication)

# Document test results:
# - Failover completion time: [actual time vs RTO target]
# - Data loss at failover point: [actual RPO vs target]
# - Issues discovered: [list any services that failed to start]
# - Action items: [configuration fixes before next test]

6 Database Replication & Recovery

Databases are the most critical and most complex component of DR — they hold the organization's irreplaceable data and require special handling to ensure consistency at the recovery point.

MySQL / MariaDB Replication for DR

# MySQL Master-Replica replication for DR
# Primary (Mumbai): 10.10.3.10  — production DB
# DR Replica (Hyderabad/S3): 10.20.3.10  — DR replica

# ── Primary Server Configuration (/etc/mysql/my.cnf) ────
[mysqld]
server-id           = 1
log_bin             = /var/log/mysql/mysql-bin.log
binlog_do_db        = erp_production
binlog_do_db        = hr_production
binlog_expire_logs_seconds = 604800    # 7-day binary log retention
sync_binlog         = 1                # Sync binlog to disk each write
innodb_flush_log_at_trx_commit = 1    # ACID compliance — every transaction flushed

# Create replication user on primary
mysql> CREATE USER 'replication_user'@'10.20.3.10'
         IDENTIFIED WITH mysql_native_password BY 'StrongReplPass!';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'replication_user'@'10.20.3.10';
mysql> FLUSH PRIVILEGES;
mysql> SHOW MASTER STATUS;   -- Note File and Position values

# ── DR Replica Configuration ─────────────────────────────
[mysqld]
server-id           = 2
relay_log           = /var/log/mysql/mysql-relay-bin
read_only           = ON          # DR replica is read-only until failover
super_read_only     = ON          # Prevents even SUPER users writing

# Configure replica to connect to primary
mysql> CHANGE REPLICATION SOURCE TO
         SOURCE_HOST='10.10.3.10',
         SOURCE_USER='replication_user',
         SOURCE_PASSWORD='StrongReplPass!',
         SOURCE_LOG_FILE='mysql-bin.000001',    -- from SHOW MASTER STATUS
         SOURCE_LOG_POS=157;
mysql> START REPLICA;
mysql> SHOW REPLICA STATUS\G   -- Verify: Seconds_Behind_Source = 0

# ── DR Failover procedure (if primary fails) ─────────────
mysql> STOP REPLICA;
mysql> SET GLOBAL read_only = OFF;
mysql> SET GLOBAL super_read_only = OFF;
# Update application config to point to DR replica IP
# DNS change: db.enterweb.local → 10.20.3.10

PostgreSQL Streaming Replication

# PostgreSQL Streaming Replication (pg_basebackup + WAL)
# Primary: /etc/postgresql/15/main/postgresql.conf
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1024          # Keep 1GB of WAL segments
synchronous_standby_names = '' # Async replication (RPO ~seconds)
# For synchronous (RPO=0, performance impact):
# synchronous_standby_names = 'dr-replica'

# Primary pg_hba.conf — allow replica connection
host    replication     repl_user    10.20.3.10/32    scram-sha-256

# Create replication user
psql> CREATE USER repl_user REPLICATION LOGIN
      ENCRYPTED PASSWORD 'StrongReplPass!';

# Initialize DR standby from primary
pg_basebackup -h 10.10.3.10 -U repl_user -D /var/lib/postgresql/15/main \
  -Fp -Xs -P -R
# -R flag creates standby.signal + postgresql.auto.conf automatically

# Start DR replica
systemctl start postgresql
psql> SELECT * FROM pg_stat_replication;  -- Verify on primary
# Should show DR replica with write_lag = 0

7 DR Testing & Validation

An untested DR plan is not a DR plan — it is a theory. The only way to know your DR will work when needed is to test it regularly, document the results honestly, and fix every issue discovered before the next test.

DR Test Types & Frequency

Test Type	Description	Frequency	Production Impact
Tabletop Exercise	Walk through DR runbook verbally — identify gaps without executing any steps	Monthly	None
Backup Restore Test	Restore a backup to an isolated environment — verify data completeness and app functionality	Monthly (per critical system)	None (isolated)
Component Failover Test	Fail over a single non-critical component (e.g., one DB replica promotion) — verify process works	Quarterly	Minor — short planned outage
Full DR Simulation	Fail over all Tier 1 and Tier 2 systems to DR environment — run production from DR for 2–4 hours	Semi-annually	Planned maintenance window required
Unannounced DR Test	Surprise failover drill — tests team readiness, not just runbook	Annually	Planned but team not pre-briefed

DR Test Report Template

# DR TEST REPORT
# Date: [Date]
# Test Type: [Backup Restore / Component Failover / Full DR Simulation]
# Systems Tested: [List]
# Test Lead: [Name]
# Participants: [Names]

TARGETS vs ACTUALS:
┌─────────────────┬──────────────┬──────────────┬────────┐
│ System          │ Target RTO   │ Actual RTO   │ Result │
├─────────────────┼──────────────┼──────────────┼────────┤
│ ERP Application │ 4 hours      │ 2h 45min     │ ✅ PASS │
│ HR System       │ 4 hours      │ 5h 10min     │ ❌ FAIL │
│ Email (M365)    │ N/A (SaaS)   │ N/A          │ ✅ N/A  │
└─────────────────┴──────────────┴──────────────┴────────┘

┌─────────────────┬──────────────┬──────────────┬────────┐
│ System          │ Target RPO   │ Actual Data  │ Result │
│                 │              │ Loss         │        │
├─────────────────┼──────────────┼──────────────┼────────┤
│ ERP Database    │ 1 hour       │ 23 minutes   │ ✅ PASS │
│ HR Database     │ 1 hour       │ 1h 47min     │ ❌ FAIL │
└─────────────────┴──────────────┴──────────────┴────────┘

ISSUES DISCOVERED:
1. HR System: DR VM failed to start — AMI outdated (6 months old)
   Action: Update AMI weekly via automated Lambda function
   Owner: DevOps Team | Due: [Date]

2. HR Database: Replication lag was 1h 47min at time of test
   Action: Investigate replication lag — check network/disk I/O
   Owner: DBA Team | Due: [Date]

NEXT TEST DATE: [Date + 90 days]

⚠️ Warning: Organizations that test DR and honestly document failures — including missed RTO/RPO targets, runbook gaps, and team knowledge deficiencies — consistently have better actual DR outcomes than organizations that pass every test by setting soft targets or avoiding realistic failure scenarios. A DR test that reveals problems is a success: it found issues that would have been catastrophic during a real disaster. A DR test where everything works perfectly should increase suspicion, not confidence — it may mean the test scenario was insufficiently realistic.

8 Business Continuity Plan (BCP)

DR recovers technology. BCP keeps the business operating while technology recovery is underway. A complete BCP addresses people, processes, communication, and manual workarounds — covering the hours or days between a disaster occurring and IT systems being restored.

BCP Core Components

Crisis communication plan: Who notifies whom, in what order, using what channels when a disaster is declared. Define primary and backup communication methods — if email is down, use WhatsApp. If phone networks are down, use a pre-designated physical assembly point
Incident declaration criteria: Precise, unambiguous conditions that trigger BCP activation — removes decision-making delay during high-stress events. Example: "If any Tier 1 system is unavailable for >30 minutes, the IT Manager declares an incident and activates BCP"
Manual workaround procedures: For each critical business process, document how it can be performed without IT systems. Invoicing on paper forms, order taking via phone with manual logs, payment processing via backup POS terminals
Alternate work locations: If the office is inaccessible, where do staff work? Cloud-based tools (M365, Google Workspace) enable working from home for knowledge workers. For operations staff, identify and pre-arrange access to an alternate facility
Vendor and supplier contacts: Maintain an offline-accessible list of all critical vendor contacts — ISP NOC numbers, cloud provider support contacts, hardware vendor emergency support, key supplier account managers
Staff responsibilities matrix: RACI chart for every recovery activity — who is Responsible, Accountable, Consulted, and Informed for each action in the recovery playbook

BCP Document Structure

# BUSINESS CONTINUITY PLAN — EnterWeb IT Firm
# Document Version: 2.0 | Last Tested: March 2026
# Owner: IT Director | Review cycle: Annual

# SECTION 1 — SCOPE AND OBJECTIVES
1  Purpose and scope of this BCP
2  RTO/RPO targets by system tier
3  Assumptions and exclusions

# SECTION 2 — INCIDENT RESPONSE TEAM
1  Incident Commander: [Name, Phone, Email, WhatsApp]
2  IT Recovery Lead: [Name, Phone, Email, WhatsApp]
3  Business Operations Lead: [Name, Phone, Email]
4  Communications Lead: [Name, Phone, Email]
5  Backup contacts for each role (in case primary unavailable)

# SECTION 3 — INCIDENT DECLARATION AND ESCALATION
1  Incident severity levels (P1/P2/P3)
2  Declaration criteria per severity level
3  Notification cascade (who calls whom)
4  Bridge/war-room setup instructions

# SECTION 4 — SYSTEM RECOVERY PROCEDURES
1  Tier 1 systems — recovery runbooks (link to separate docs)
2  Tier 2 systems — recovery runbooks
3  Tier 3 systems — restore from backup procedures

# SECTION 5 — MANUAL BUSINESS OPERATIONS
1  Order processing without ERP
2  Invoicing and billing without ERP
3  HR processes without HRMS
4  Communication without corporate email

# SECTION 6 — VENDOR CONTACTS (printed copy mandatory)
1  ISP NOC emergency numbers
2  Cloud provider support contacts
3  Hardware vendor support
4  Cybersecurity incident response retainer

# SECTION 7 — TESTING AND MAINTENANCE
1  Test schedule and history
2  Document review and update procedure
3  Post-incident review process

90-Day DR Programme Setup Checklist

Conduct Business Impact Analysis with department heads — classify all systems by RTO/RPO

Implement 3-2-1-1-0 backup architecture — verify with automated restore tests

Enable immutable backup vault (AWS Backup Lock / Azure Backup soft-delete)

Deploy AWS Backup plan or Azure Backup vault for all production workloads

Set up pilot light DR environment in secondary region for Tier 1/2 systems

Configure Route53 or Azure Traffic Manager health-check based DNS failover

Deploy database replication (MySQL replica / RDS read replica) to DR region

Document DR runbook with step-by-step failover procedure and rollback

Conduct first tabletop exercise — walk through a ransomware scenario

Conduct first backup restore test — restore DB to isolated environment, verify data

Write and distribute BCP document — distribute printed copies to key staff

Schedule quarterly DR tests — add to IT calendar for next 12 months

Need to Build a DR Program?

EnterWeb IT Firm designs and implements end-to-end Disaster Recovery and Business Continuity programs — from Business Impact Analysis and RTO/RPO definition through backup architecture, AWS/Azure DR setup, runbook documentation, and quarterly DR testing for Indian enterprises.

Request DR Assessment View Managed Services