High Availability & Disaster Recovery
The HA/DR Architecture Map
Oracle DBA parallel: Local HA = RAC. Cross-Region DR = Data Guard. The big difference: in Azure, local HA is built-in to the service tier (you don't configure it). You only configure DR (Failover Groups, Geo-Replication).
Backup & Restore — The Safety Net
For a comprehensive deep dive into backup, restore, PITR, and LTR, see the dedicated page: Backup, Restore, PITR & LTR.
Automated Backups (Zero Configuration)
| Backup Type | Frequency | You Configure? |
|---|---|---|
| Full | Weekly | ❌ Automatic |
| Differential | Every 12-24 hours | ❌ Automatic |
| Transaction log | Every 5-10 minutes | ❌ Automatic |
| Setting | Range | Default |
|---|---|---|
| Short-term retention (PITR) | 7-35 days | 7 days |
| Long-term retention (LTR) | Up to 10 years | Not configured |
| Backup storage redundancy | LRS / ZRS / GRS / RA-GRS | GRS |
Backup frequencies are exam favorites: Full = weekly, Diff = 12-24h, Log = 5-10 min. Point-in-time restore creates a new database — it NEVER overwrites the existing one. This catches many exam takers.
Restore Options
| Scenario | Method | RPO |
|---|---|---|
| "Restore to 3 hours ago" | PITR (point-in-time restore) | Seconds (within retention) |
| "Restore deleted database" | Deleted database restore | To deletion time |
| "Keep backup for 5 years" | Long-Term Retention (LTR) | Weekly/monthly/yearly snapshots |
| "Restore to different region" | Geo-restore from GRS backup | Up to 1 hour lag |
Backup Storage Redundancy — Architecture
Backup storage redundancy can only be set at database creation time (or within a short window for SQL DB). You CANNOT change from LRS to GRS after the fact. Choose wisely based on compliance requirements.
Active Geo-Replication
Asynchronous replication to up to 4 readable secondaries in any Azure region.
| Aspect | Detail |
|---|---|
| Max secondaries | 4 (any region) |
| Replication | Asynchronous |
| Readable? | ✅ Yes — offload reporting |
| Auto failover? | ❌ No — manual only |
| Endpoint | Each secondary has its own connection string |
| Scope | Per-database |
When to use: You need more than 1 secondary, or you want readable replicas in multiple regions for read locality.
For a detailed side-by-side comparison, see Failover Groups vs Geo-Replication.
Auto-Failover Groups (Recommended for DR)
Automatic failover with a single connection endpoint that doesn't change.
Architecture
Connection endpoints (they NEVER change):
- Read-Write:
fog-name.database.windows.net - Read-Only:
fog-name.secondary.database.windows.net
Failover Groups vs Geo-Replication — the key exam differences:
- Failover Groups = auto failover + single endpoint. Geo-Rep = manual failover + per-replica endpoint.
- Failover Groups can include multiple databases. Geo-Rep is per-database.
- Failover Groups have a grace period before auto-failover triggers (default 60 min).
- For MI: Failover Groups replicate ALL user databases (all or nothing).
Comparison Table
| Feature | Active Geo-Rep | Failover Group | Always On AG (VM) |
|---|---|---|---|
| Scope | Azure SQL DB | Azure SQL DB + MI | SQL Server on VM |
| Max replicas | 4 | 1 partner | Up to 9 |
| Auto failover | ❌ | ✅ | ✅ (within cluster) |
| Single endpoint | ❌ | ✅ | ✅ (Listener) |
| Cross-region | ✅ | ✅ | ✅ |
| Read-only routing | ✅ | ✅ | ✅ |
| Grace period | N/A | Configurable (default 60 min) | N/A (cluster quorum) |
RTO and RPO — The Business Conversation
| Solution | RPO (data loss) | RTO (downtime) |
|---|---|---|
| PITR (backup restore) | 5-10 min | Hours (depends on DB size) |
| Geo-Restore (GRS backup) | Up to 1 hour | Hours |
| Active Geo-Replication | < 5 seconds | 30 seconds (manual trigger) |
| Failover Groups (auto) | < 5 seconds | Grace period + ~30 seconds |
| BC tier read replica | 0 (synchronous) | < 10 seconds |
| Always On AG (sync) | 0 (synchronous) | < 30 seconds |
Cost vs. RTO/RPO tradeoff: Automated backups (free) give you hours of RTO. Geo-Replication/Failover Groups (extra cost for secondary) give you seconds of RPO and ~30s RTO. Business Critical tier gives you near-zero downtime for local failures. Choose based on business requirements, not maximum capability.