Failover Groups vs Active Geo-Replication

Both provide cross-region disaster recovery for Azure SQL — but they work differently and suit different scenarios. This page makes the choice clear.

Side-by-Side Comparison

Aspect	Active Geo-Replication	Auto-Failover Groups
What it is	Async readable secondary in any region	Managed failover with single endpoint
Max secondaries	4	1 partner (primary ↔ secondary)
Auto failover	❌ Manual only	✅ Automatic (with grace period)
Single endpoint	❌ Each secondary has its own	✅ `fog.database.windows.net` (never changes)
Read-only endpoint	✅ Per secondary	✅ `fog.secondary.database.windows.net`
Scope	Per database	Multiple databases (group)
Failover unit	One database at a time	All databases in the group together
DNS update on failover	❌ Must update connection strings	✅ Automatic DNS flip
Grace period	N/A	Configurable (default 60 min)
Works with SQL DB	✅	✅
Works with MI	❌	✅ (all user DBs, all-or-nothing)
Replication	Async	Async
RPO	< 5 seconds	< 5 seconds
RTO	Manual trigger (~30 sec)	Grace period + ~30 sec
Cost	Secondary DB billed	Secondary DB billed

What Each Option Is

Active Geo-Replication

A per-database feature that creates an asynchronous readable copy in another region. You manage failover manually. Each secondary has its own connection string.

Think of it as: "I want read replicas in other regions, and I'll handle failover myself."

Auto-Failover Groups

A managed group of databases with automatic failover, a single DNS endpoint that never changes, and built-in read/write + read-only listeners.

Think of it as: "I want Azure to handle everything — one endpoint, automatic failover, multiple databases fail over together."

Endpoint Behavior

This is the biggest practical difference and the most tested on DP-300.

Geo-Replication Endpoints

Database	Endpoint
Primary	`server1.database.windows.net`
Secondary 1	`server2-region2.database.windows.net`
Secondary 2	`server3-region3.database.windows.net`

Problem: On failover, your app must update its connection string to point to the new primary.

Failover Group Endpoints

Endpoint	Points To	Changes on Failover?
`fog-name.database.windows.net`	Current primary	❌ Never changes
`fog-name.secondary.database.windows.net`	Current secondary	❌ Never changes

Advantage: Your app connects to the fog endpoint and never needs to change connection strings, even after failover. DNS flips automatically.

🎯 Exam Focus

DP-300 loves this question: "How do you ensure application connection strings don't change during failover?" → Auto-Failover Groups. The fog endpoint abstracts the primary/secondary — DNS updates happen automatically behind the scenes.

Failover Behavior

Geo-Replication Failover

You detect the outage (or Azure notifies you)
You manually trigger failover to a specific secondary
That secondary becomes the new primary
You update application connection strings to the new server
The old primary (when recovered) becomes a secondary

Failover Group Failover

Azure detects the outage
Grace period passes (default 60 min — configurable)
Automatic failover triggers (or you trigger manually before grace period)
DNS endpoint fog.database.windows.net automatically points to the new primary
Applications don't need any changes
The old primary (when recovered) automatically becomes the secondary

⚠️ Watch Out

Grace period trap: The default grace period is 60 minutes. This means Azure waits 60 minutes before automatically failing over. During this time, the database is unavailable. You can set it as low as 1 hour or trigger manual failover immediately. Set this based on your RTO requirement.

Scope: Per-Database vs Group

Scenario	Geo-Replication	Failover Groups
Fail over one database	✅	❌ (all or nothing)
Fail over 10 databases together consistently	❌ (10 separate failovers)	✅ (one action)
Different databases in different regions	✅ (flexible)	❌ (one partner region)
Mix of databases with and without DR	✅ (per DB choice)	❌ (all in group)

🏢 Real-World DBA Note

Production pattern: Use Failover Groups for your core application databases that must fail over together (consistency). Use Geo-Replication for analytics/reporting replicas that don't need automatic failover.

Managed Instance Differences

Feature	SQL DB	MI
Geo-Replication	✅	❌ Not supported
Failover Groups	✅	✅
FG scope	Selected databases	ALL user databases (all-or-nothing)
FG limit	Multiple groups per server	1 failover group per MI

🎯 Exam Focus

MI critical facts: MI does NOT support Active Geo-Replication — only Failover Groups. And MI Failover Groups replicate ALL user databases — you can't choose which ones. Only 1 failover group per MI.

Choose This When...

Requirement	Choose
"Connection strings must never change on failover"	Failover Groups
"Failover must be automatic"	Failover Groups
"I need readable replicas in 3+ regions"	Geo-Replication (up to 4 secondaries)
"Multiple databases must fail over as a unit"	Failover Groups
"I want control over which databases have DR"	Geo-Replication (per database)
"I use Managed Instance"	Failover Groups (only option)
"I need the simplest setup"	Failover Groups
"I need read locality in multiple regions"	Geo-Replication (secondary per region)
"Recommended by Microsoft for production DR"	Failover Groups

Common Misconceptions

Misconception	Reality
"Geo-Replication has automatic failover"	❌ Manual only. Failover Groups have automatic.
"Failover Groups support 4 secondaries"	❌ Only 1 partner. Geo-Replication supports up to 4.
"I can choose which MI databases to replicate"	❌ MI Failover Groups replicate ALL user databases.
"Failover is instant"	❌ Grace period (default 60 min) + ~30 sec failover.
"Both protect against accidental DELETE"	❌ Neither does. Deletion replicates. Use PITR.
"Geo-Replication works on MI"	❌ MI only supports Failover Groups.
"Failover Group DNS takes hours to update"	❌ DNS TTL is 30 seconds. Failover is fast once triggered.
"I need both for full DR"	❌ Usually one or the other. FG is sufficient for most production scenarios.

RPO/RTO Summary

Solution	RPO	RTO
Geo-Replication (manual failover)	< 5 sec	Manual trigger + ~30 sec
Failover Groups (auto)	< 5 sec	Grace period + ~30 sec
Failover Groups (manual)	< 5 sec	~30 sec
PITR (for comparison)	5-10 min	Hours

Both have identical RPO (< 5 seconds) because both use async replication. The RTO difference is the grace period — Geo-Rep depends on how fast you react, Failover Groups wait for the grace period then auto-failover.

Anti-Patterns

"Geo-Rep + manual app DNS = same as a FOG." Functionally close, but your DNS swap is now your RTO floor. Traffic Manager / Front Door TTLs add 30 s–minutes. FOG endpoint switches in seconds inside Azure DNS.
"Set grace period to 0 to fail over fast." Transient SQL connectivity blips will fail you over to the paired region. Then you have to fail back. Recommended floor is 60 min for production; faster recovery is what manual forced failover is for.
"Use a FOG to fail over one DB out of many." Impossible. FOG fails over all member DBs together. If the apps are independent, give each its own FOG — not one FOG with all DBs.
"MI Geo-Replication." Doesn't exist. MI only supports Failover Groups (instance-scope, all databases). The exam plants “move a single MI database to another region” as a distractor — the answer is “not supported.”
"FOG protects against bad deploys." No. The bad schema change replicates to the secondary in seconds. PITR is the only protection against logical corruption / accidental DROP.
"Read-only endpoint on FOG = free read scale." It points to the current secondary, which after a failover becomes the new primary's old self in the other region. App reads can suddenly cross regions — latency spikes silently.

⚠️ Watch Out

FOG endpoints survive failover; database-level Geo-Rep endpoints don't. This is the single most asked DP-300 question on this topic.

Migration Between Options

From → To	Path	Cost
Geo-Replication → Failover Group	Wrap existing replica into a new FOG referencing same secondary	Online; no reseed; gain auto-failover + endpoint
Failover Group → Geo-Replication	Drop FOG; existing geo-replica remains	Online; lose auto-failover + endpoints; app must update DNS
FOG (1 DB) → FOG (multi-DB)	Add DBs to existing FOG	Each new DB triggers a one-time seed
FOG → Different paired region	Drop + recreate; fresh seed	Compute cost during seed; brief unavailability of failover capability
Geo-Rep (sync) → Geo-Rep (async, default)	N/A — Geo-Rep is always async	(Trap question — there is no sync Geo-Rep on SQL DB)
Active Geo-Rep + custom DNS → FOG	Configure FOG, retire DNS, app uses FOG endpoints	Days of parallel-running recommended; rollback = bring DNS back

Moves between the two are cheap (no data reseed in most paths). The one-way pain is changing paired region — forces a fresh secondary.

Real Scenarios

Single SQL DB, 99.99 % SLA, app must auto-recover → Failover Group, grace period 60 min. Driver: app uses FOG endpoint, no code change on failover.
3 logically independent SaaS DBs in same server, each with its own app → 3 separate Failover Groups. Driver: each app fails over on its own. Trade-off: 3 secondaries to pay for.
Reporting workload reading from a stale secondary 200 km away → Active Geo-Replication, manual failover only, app talks to readable secondary directly. Driver: no need for auto-failover; reads off-loaded.
MI hosting 80 databases for a legacy ERP → MI Failover Group. Driver: only option for MI. Trade-off accepted: all 80 DBs fail over together, no per-DB control.
App depending on cross-DB queries within one server → One FOG containing all related DBs. Driver: cross-DB queries break if databases fail over independently and end up in different regions.

Flashcards

What is the biggest practical difference between Geo-Replication and Failover Groups?

Click to reveal answer

Failover Groups provide a single DNS endpoint that never changes on failover. Geo-Replication requires updating connection strings to point to the new primary.

1 / 8

Quiz

Q1/5

0 correct

An application needs cross-region DR with zero connection string changes on failover. What should you use?

Side-by-Side Comparison​

What Each Option Is​

Active Geo-Replication​

Auto-Failover Groups​

Endpoint Behavior​

Geo-Replication Endpoints​

Failover Group Endpoints​

Failover Behavior​

Geo-Replication Failover​

Failover Group Failover​

Scope: Per-Database vs Group​

Managed Instance Differences​

Choose This When...​

Common Misconceptions​

RPO/RTO Summary​

Anti-Patterns​

Migration Between Options​

Real Scenarios​

Flashcards​

Quiz​