Skip to main content

Failover Groups vs Active Geo-Replication

Both provide cross-region disaster recovery for Azure SQL β€” but they work differently and suit different scenarios. This page makes the choice clear.


Side-by-Side Comparison​

AspectActive Geo-ReplicationAuto-Failover Groups
What it isAsync readable secondary in any regionManaged failover with single endpoint
Max secondaries41 partner (primary ↔ secondary)
Auto failover❌ Manual onlyβœ… Automatic (with grace period)
Single endpoint❌ Each secondary has its ownβœ… fog.database.windows.net (never changes)
Read-only endpointβœ… Per secondaryβœ… fog.secondary.database.windows.net
ScopePer databaseMultiple databases (group)
Failover unitOne database at a timeAll databases in the group together
DNS update on failover❌ Must update connection stringsβœ… Automatic DNS flip
Grace periodN/AConfigurable (default 60 min)
Works with SQL DBβœ…βœ…
Works with MIβŒβœ… (all user DBs, all-or-nothing)
ReplicationAsyncAsync
RPO< 5 seconds< 5 seconds
RTOManual trigger (~30 sec)Grace period + ~30 sec
CostSecondary DB billedSecondary DB billed

What Each Option Is​

Active Geo-Replication​

A per-database feature that creates an asynchronous readable copy in another region. You manage failover manually. Each secondary has its own connection string.

Think of it as: "I want read replicas in other regions, and I'll handle failover myself."

Auto-Failover Groups​

A managed group of databases with automatic failover, a single DNS endpoint that never changes, and built-in read/write + read-only listeners.

Think of it as: "I want Azure to handle everything β€” one endpoint, automatic failover, multiple databases fail over together."


Endpoint Behavior​

This is the biggest practical difference and the most tested on DP-300.

Geo-Replication Endpoints​

DatabaseEndpoint
Primaryserver1.database.windows.net
Secondary 1server2-region2.database.windows.net
Secondary 2server3-region3.database.windows.net

Problem: On failover, your app must update its connection string to point to the new primary.

Failover Group Endpoints​

EndpointPoints ToChanges on Failover?
fog-name.database.windows.netCurrent primary❌ Never changes
fog-name.secondary.database.windows.netCurrent secondary❌ Never changes

Advantage: Your app connects to the fog endpoint and never needs to change connection strings, even after failover. DNS flips automatically.

🎯 Exam Focus

DP-300 loves this question: "How do you ensure application connection strings don't change during failover?" β†’ Auto-Failover Groups. The fog endpoint abstracts the primary/secondary β€” DNS updates happen automatically behind the scenes.


Failover Behavior​

Geo-Replication Failover​

  1. You detect the outage (or Azure notifies you)
  2. You manually trigger failover to a specific secondary
  3. That secondary becomes the new primary
  4. You update application connection strings to the new server
  5. The old primary (when recovered) becomes a secondary

Failover Group Failover​

  1. Azure detects the outage
  2. Grace period passes (default 60 min β€” configurable)
  3. Automatic failover triggers (or you trigger manually before grace period)
  4. DNS endpoint fog.database.windows.net automatically points to the new primary
  5. Applications don't need any changes
  6. The old primary (when recovered) automatically becomes the secondary
⚠️ Watch Out

Grace period trap: The default grace period is 60 minutes. This means Azure waits 60 minutes before automatically failing over. During this time, the database is unavailable. You can set it as low as 1 hour or trigger manual failover immediately. Set this based on your RTO requirement.


Scope: Per-Database vs Group​

ScenarioGeo-ReplicationFailover Groups
Fail over one databaseβœ…βŒ (all or nothing)
Fail over 10 databases together consistently❌ (10 separate failovers)βœ… (one action)
Different databases in different regionsβœ… (flexible)❌ (one partner region)
Mix of databases with and without DRβœ… (per DB choice)❌ (all in group)
🏒 Real-World DBA Note

Production pattern: Use Failover Groups for your core application databases that must fail over together (consistency). Use Geo-Replication for analytics/reporting replicas that don't need automatic failover.


Managed Instance Differences​

FeatureSQL DBMI
Geo-Replicationβœ…βŒ Not supported
Failover Groupsβœ…βœ…
FG scopeSelected databasesALL user databases (all-or-nothing)
FG limitMultiple groups per server1 failover group per MI
🎯 Exam Focus

MI critical facts: MI does NOT support Active Geo-Replication β€” only Failover Groups. And MI Failover Groups replicate ALL user databases β€” you can't choose which ones. Only 1 failover group per MI.


Choose This When...​

RequirementChoose
"Connection strings must never change on failover"Failover Groups
"Failover must be automatic"Failover Groups
"I need readable replicas in 3+ regions"Geo-Replication (up to 4 secondaries)
"Multiple databases must fail over as a unit"Failover Groups
"I want control over which databases have DR"Geo-Replication (per database)
"I use Managed Instance"Failover Groups (only option)
"I need the simplest setup"Failover Groups
"I need read locality in multiple regions"Geo-Replication (secondary per region)
"Recommended by Microsoft for production DR"Failover Groups

Common Misconceptions​

MisconceptionReality
"Geo-Replication has automatic failover"❌ Manual only. Failover Groups have automatic.
"Failover Groups support 4 secondaries"❌ Only 1 partner. Geo-Replication supports up to 4.
"I can choose which MI databases to replicate"❌ MI Failover Groups replicate ALL user databases.
"Failover is instant"❌ Grace period (default 60 min) + ~30 sec failover.
"Both protect against accidental DELETE"❌ Neither does. Deletion replicates. Use PITR.
"Geo-Replication works on MI"❌ MI only supports Failover Groups.
"Failover Group DNS takes hours to update"❌ DNS TTL is 30 seconds. Failover is fast once triggered.
"I need both for full DR"❌ Usually one or the other. FG is sufficient for most production scenarios.

RPO/RTO Summary​

SolutionRPORTO
Geo-Replication (manual failover)< 5 secManual trigger + ~30 sec
Failover Groups (auto)< 5 secGrace period + ~30 sec
Failover Groups (manual)< 5 sec~30 sec
PITR (for comparison)5-10 minHours

Both have identical RPO (< 5 seconds) because both use async replication. The RTO difference is the grace period β€” Geo-Rep depends on how fast you react, Failover Groups wait for the grace period then auto-failover.


Anti-Patterns​

  • "Geo-Rep + manual app DNS = same as a FOG." Functionally close, but your DNS swap is now your RTO floor. Traffic Manager / Front Door TTLs add 30 s–minutes. FOG endpoint switches in seconds inside Azure DNS.
  • "Set grace period to 0 to fail over fast." Transient SQL connectivity blips will fail you over to the paired region. Then you have to fail back. Recommended floor is 60 min for production; faster recovery is what manual forced failover is for.
  • "Use a FOG to fail over one DB out of many." Impossible. FOG fails over all member DBs together. If the apps are independent, give each its own FOG β€” not one FOG with all DBs.
  • "MI Geo-Replication." Doesn't exist. MI only supports Failover Groups (instance-scope, all databases). The exam plants β€œmove a single MI database to another region” as a distractor β€” the answer is β€œnot supported.”
  • "FOG protects against bad deploys." No. The bad schema change replicates to the secondary in seconds. PITR is the only protection against logical corruption / accidental DROP.
  • "Read-only endpoint on FOG = free read scale." It points to the current secondary, which after a failover becomes the new primary's old self in the other region. App reads can suddenly cross regions β€” latency spikes silently.
⚠️ Watch Out

FOG endpoints survive failover; database-level Geo-Rep endpoints don't. This is the single most asked DP-300 question on this topic.


Migration Between Options​

From β†’ ToPathCost
Geo-Replication β†’ Failover GroupWrap existing replica into a new FOG referencing same secondaryOnline; no reseed; gain auto-failover + endpoint
Failover Group β†’ Geo-ReplicationDrop FOG; existing geo-replica remainsOnline; lose auto-failover + endpoints; app must update DNS
FOG (1 DB) β†’ FOG (multi-DB)Add DBs to existing FOGEach new DB triggers a one-time seed
FOG β†’ Different paired regionDrop + recreate; fresh seedCompute cost during seed; brief unavailability of failover capability
Geo-Rep (sync) β†’ Geo-Rep (async, default)N/A β€” Geo-Rep is always async(Trap question β€” there is no sync Geo-Rep on SQL DB)
Active Geo-Rep + custom DNS β†’ FOGConfigure FOG, retire DNS, app uses FOG endpointsDays of parallel-running recommended; rollback = bring DNS back

Moves between the two are cheap (no data reseed in most paths). The one-way pain is changing paired region β€” forces a fresh secondary.


Real Scenarios​

  1. Single SQL DB, 99.99 % SLA, app must auto-recover β†’ Failover Group, grace period 60 min. Driver: app uses FOG endpoint, no code change on failover.
  2. 3 logically independent SaaS DBs in same server, each with its own app β†’ 3 separate Failover Groups. Driver: each app fails over on its own. Trade-off: 3 secondaries to pay for.
  3. Reporting workload reading from a stale secondary 200 km away β†’ Active Geo-Replication, manual failover only, app talks to readable secondary directly. Driver: no need for auto-failover; reads off-loaded.
  4. MI hosting 80 databases for a legacy ERP β†’ MI Failover Group. Driver: only option for MI. Trade-off accepted: all 80 DBs fail over together, no per-DB control.
  5. App depending on cross-DB queries within one server β†’ One FOG containing all related DBs. Driver: cross-DB queries break if databases fail over independently and end up in different regions.

Flashcards​

What is the biggest practical difference between Geo-Replication and Failover Groups?
Click to reveal answer
Failover Groups provide a single DNS endpoint that never changes on failover. Geo-Replication requires updating connection strings to point to the new primary.
1 / 8

Quiz​

Q1/5
0 correct
An application needs cross-region DR with zero connection string changes on failover. What should you use?