Skip to main content

High Availability & Disaster Recovery

The HA/DR Architecture Map

HA/DR Architecture Map
🏢
Local HA (Built-in)
GP: Service Fabric + remote storage. BC: 4-node Always On AG with local SSD. Automatic.
🌍
Cross-Region DR
Failover Groups (auto failover + single endpoint) or Active Geo-Replication (manual, up to 4 replicas)
💾
Backup & Restore
PITR (7-35 days), LTR (up to 10 years), Geo-restore from GRS backup
🏢 Real-World DBA Note

Oracle DBA parallel: Local HA = RAC. Cross-Region DR = Data Guard. The big difference: in Azure, local HA is built-in to the service tier (you don't configure it). You only configure DR (Failover Groups, Geo-Replication).

Backup & Restore — The Safety Net

For a comprehensive deep dive into backup, restore, PITR, and LTR, see the dedicated page: Backup, Restore, PITR & LTR.

Automated Backups (Zero Configuration)

Backup TypeFrequencyYou Configure?
FullWeekly❌ Automatic
DifferentialEvery 12-24 hours❌ Automatic
Transaction logEvery 5-10 minutes❌ Automatic
SettingRangeDefault
Short-term retention (PITR)7-35 days7 days
Long-term retention (LTR)Up to 10 yearsNot configured
Backup storage redundancyLRS / ZRS / GRS / RA-GRSGRS
🎯 Exam Focus

Backup frequencies are exam favorites: Full = weekly, Diff = 12-24h, Log = 5-10 min. Point-in-time restore creates a new database — it NEVER overwrites the existing one. This catches many exam takers.

Restore Options

ScenarioMethodRPO
"Restore to 3 hours ago"PITR (point-in-time restore)Seconds (within retention)
"Restore deleted database"Deleted database restoreTo deletion time
"Keep backup for 5 years"Long-Term Retention (LTR)Weekly/monthly/yearly snapshots
"Restore to different region"Geo-restore from GRS backupUp to 1 hour lag

Backup Storage Redundancy — Architecture

Backup Storage Redundancy
💾
LRS
3 copies in one datacenter. Lowest cost. No cross-region protection.
🏢
ZRS
3 copies across availability zones. Protects against zone failures.
🌍
GRS
6 copies: 3 local + 3 in paired region. Default. Cross-region protection.
🔄
RA-GRS
GRS + read access to secondary region copies. Highest availability.
⚠️ Watch Out

Backup storage redundancy can only be set at database creation time (or within a short window for SQL DB). You CANNOT change from LRS to GRS after the fact. Choose wisely based on compliance requirements.

Active Geo-Replication

Active Geo-Replication Architecture

Asynchronous replication to up to 4 readable secondaries in any Azure region.

AspectDetail
Max secondaries4 (any region)
ReplicationAsynchronous
Readable?✅ Yes — offload reporting
Auto failover?❌ No — manual only
EndpointEach secondary has its own connection string
ScopePer-database

When to use: You need more than 1 secondary, or you want readable replicas in multiple regions for read locality.

For a detailed side-by-side comparison, see Failover Groups vs Geo-Replication.

Automatic failover with a single connection endpoint that doesn't change.

Architecture

Auto-Failover Groups Architecture
Failover Groups Architecture
✉️
Single Endpoint
fog-name.database.windows.net — never changes on failover. Apps need no connection string update.
Automatic Failover
Grace period (default 60 min) before auto-failover. Manual failover also available.
📚
Read-Only Endpoint
fog-name.secondary.database.windows.net routes to readable secondary for reporting.

Connection endpoints (they NEVER change):

  • Read-Write: fog-name.database.windows.net
  • Read-Only: fog-name.secondary.database.windows.net
🎯 Exam Focus

Failover Groups vs Geo-Replication — the key exam differences:

  1. Failover Groups = auto failover + single endpoint. Geo-Rep = manual failover + per-replica endpoint.
  2. Failover Groups can include multiple databases. Geo-Rep is per-database.
  3. Failover Groups have a grace period before auto-failover triggers (default 60 min).
  4. For MI: Failover Groups replicate ALL user databases (all or nothing).

Comparison Table

FeatureActive Geo-RepFailover GroupAlways On AG (VM)
ScopeAzure SQL DBAzure SQL DB + MISQL Server on VM
Max replicas41 partnerUp to 9
Auto failover✅ (within cluster)
Single endpoint✅ (Listener)
Cross-region
Read-only routing
Grace periodN/AConfigurable (default 60 min)N/A (cluster quorum)

RTO and RPO — The Business Conversation

SolutionRPO (data loss)RTO (downtime)
PITR (backup restore)5-10 minHours (depends on DB size)
Geo-Restore (GRS backup)Up to 1 hourHours
Active Geo-Replication< 5 seconds30 seconds (manual trigger)
Failover Groups (auto)< 5 secondsGrace period + ~30 seconds
BC tier read replica0 (synchronous)< 10 seconds
Always On AG (sync)0 (synchronous)< 30 seconds
🏢 Real-World DBA Note

Cost vs. RTO/RPO tradeoff: Automated backups (free) give you hours of RTO. Geo-Replication/Failover Groups (extra cost for secondary) give you seconds of RPO and ~30s RTO. Business Critical tier gives you near-zero downtime for local failures. Choose based on business requirements, not maximum capability.


Flashcards

How often are transaction log backups taken in Azure SQL DB?
Click to reveal answer
Every 5-10 minutes, automatically
1 / 8

Quiz

Q1/5
0 correct
A database needs automatic cross-region failover with a connection string that doesn't change. What should you use?