Failover Groups vs Active Geo-Replication
Both provide cross-region disaster recovery for Azure SQL β but they work differently and suit different scenarios. This page makes the choice clear.
Side-by-Side Comparisonβ
| Aspect | Active Geo-Replication | Auto-Failover Groups |
|---|---|---|
| What it is | Async readable secondary in any region | Managed failover with single endpoint |
| Max secondaries | 4 | 1 partner (primary β secondary) |
| Auto failover | β Manual only | β Automatic (with grace period) |
| Single endpoint | β Each secondary has its own | β
fog.database.windows.net (never changes) |
| Read-only endpoint | β Per secondary | β
fog.secondary.database.windows.net |
| Scope | Per database | Multiple databases (group) |
| Failover unit | One database at a time | All databases in the group together |
| DNS update on failover | β Must update connection strings | β Automatic DNS flip |
| Grace period | N/A | Configurable (default 60 min) |
| Works with SQL DB | β | β |
| Works with MI | β | β (all user DBs, all-or-nothing) |
| Replication | Async | Async |
| RPO | < 5 seconds | < 5 seconds |
| RTO | Manual trigger (~30 sec) | Grace period + ~30 sec |
| Cost | Secondary DB billed | Secondary DB billed |
What Each Option Isβ
Active Geo-Replicationβ
A per-database feature that creates an asynchronous readable copy in another region. You manage failover manually. Each secondary has its own connection string.
Think of it as: "I want read replicas in other regions, and I'll handle failover myself."
Auto-Failover Groupsβ
A managed group of databases with automatic failover, a single DNS endpoint that never changes, and built-in read/write + read-only listeners.
Think of it as: "I want Azure to handle everything β one endpoint, automatic failover, multiple databases fail over together."
Endpoint Behaviorβ
This is the biggest practical difference and the most tested on DP-300.
Geo-Replication Endpointsβ
| Database | Endpoint |
|---|---|
| Primary | server1.database.windows.net |
| Secondary 1 | server2-region2.database.windows.net |
| Secondary 2 | server3-region3.database.windows.net |
Problem: On failover, your app must update its connection string to point to the new primary.
Failover Group Endpointsβ
| Endpoint | Points To | Changes on Failover? |
|---|---|---|
fog-name.database.windows.net | Current primary | β Never changes |
fog-name.secondary.database.windows.net | Current secondary | β Never changes |
Advantage: Your app connects to the fog endpoint and never needs to change connection strings, even after failover. DNS flips automatically.
DP-300 loves this question: "How do you ensure application connection strings don't change during failover?" β Auto-Failover Groups. The fog endpoint abstracts the primary/secondary β DNS updates happen automatically behind the scenes.
Failover Behaviorβ
Geo-Replication Failoverβ
- You detect the outage (or Azure notifies you)
- You manually trigger failover to a specific secondary
- That secondary becomes the new primary
- You update application connection strings to the new server
- The old primary (when recovered) becomes a secondary
Failover Group Failoverβ
- Azure detects the outage
- Grace period passes (default 60 min β configurable)
- Automatic failover triggers (or you trigger manually before grace period)
- DNS endpoint
fog.database.windows.netautomatically points to the new primary - Applications don't need any changes
- The old primary (when recovered) automatically becomes the secondary
Grace period trap: The default grace period is 60 minutes. This means Azure waits 60 minutes before automatically failing over. During this time, the database is unavailable. You can set it as low as 1 hour or trigger manual failover immediately. Set this based on your RTO requirement.
Scope: Per-Database vs Groupβ
| Scenario | Geo-Replication | Failover Groups |
|---|---|---|
| Fail over one database | β | β (all or nothing) |
| Fail over 10 databases together consistently | β (10 separate failovers) | β (one action) |
| Different databases in different regions | β (flexible) | β (one partner region) |
| Mix of databases with and without DR | β (per DB choice) | β (all in group) |
Production pattern: Use Failover Groups for your core application databases that must fail over together (consistency). Use Geo-Replication for analytics/reporting replicas that don't need automatic failover.
Managed Instance Differencesβ
| Feature | SQL DB | MI |
|---|---|---|
| Geo-Replication | β | β Not supported |
| Failover Groups | β | β |
| FG scope | Selected databases | ALL user databases (all-or-nothing) |
| FG limit | Multiple groups per server | 1 failover group per MI |
MI critical facts: MI does NOT support Active Geo-Replication β only Failover Groups. And MI Failover Groups replicate ALL user databases β you can't choose which ones. Only 1 failover group per MI.
Choose This When...β
| Requirement | Choose |
|---|---|
| "Connection strings must never change on failover" | Failover Groups |
| "Failover must be automatic" | Failover Groups |
| "I need readable replicas in 3+ regions" | Geo-Replication (up to 4 secondaries) |
| "Multiple databases must fail over as a unit" | Failover Groups |
| "I want control over which databases have DR" | Geo-Replication (per database) |
| "I use Managed Instance" | Failover Groups (only option) |
| "I need the simplest setup" | Failover Groups |
| "I need read locality in multiple regions" | Geo-Replication (secondary per region) |
| "Recommended by Microsoft for production DR" | Failover Groups |
Common Misconceptionsβ
| Misconception | Reality |
|---|---|
| "Geo-Replication has automatic failover" | β Manual only. Failover Groups have automatic. |
| "Failover Groups support 4 secondaries" | β Only 1 partner. Geo-Replication supports up to 4. |
| "I can choose which MI databases to replicate" | β MI Failover Groups replicate ALL user databases. |
| "Failover is instant" | β Grace period (default 60 min) + ~30 sec failover. |
| "Both protect against accidental DELETE" | β Neither does. Deletion replicates. Use PITR. |
| "Geo-Replication works on MI" | β MI only supports Failover Groups. |
| "Failover Group DNS takes hours to update" | β DNS TTL is 30 seconds. Failover is fast once triggered. |
| "I need both for full DR" | β Usually one or the other. FG is sufficient for most production scenarios. |
RPO/RTO Summaryβ
| Solution | RPO | RTO |
|---|---|---|
| Geo-Replication (manual failover) | < 5 sec | Manual trigger + ~30 sec |
| Failover Groups (auto) | < 5 sec | Grace period + ~30 sec |
| Failover Groups (manual) | < 5 sec | ~30 sec |
| PITR (for comparison) | 5-10 min | Hours |
Both have identical RPO (< 5 seconds) because both use async replication. The RTO difference is the grace period β Geo-Rep depends on how fast you react, Failover Groups wait for the grace period then auto-failover.
Anti-Patternsβ
- "Geo-Rep + manual app DNS = same as a FOG." Functionally close, but your DNS swap is now your RTO floor. Traffic Manager / Front Door TTLs add 30 sβminutes. FOG endpoint switches in seconds inside Azure DNS.
- "Set grace period to 0 to fail over fast." Transient SQL connectivity blips will fail you over to the paired region. Then you have to fail back. Recommended floor is 60 min for production; faster recovery is what manual forced failover is for.
- "Use a FOG to fail over one DB out of many." Impossible. FOG fails over all member DBs together. If the apps are independent, give each its own FOG β not one FOG with all DBs.
- "MI Geo-Replication." Doesn't exist. MI only supports Failover Groups (instance-scope, all databases). The exam plants βmove a single MI database to another regionβ as a distractor β the answer is βnot supported.β
- "FOG protects against bad deploys." No. The bad schema change replicates to the secondary in seconds. PITR is the only protection against logical corruption / accidental DROP.
- "Read-only endpoint on FOG = free read scale." It points to the current secondary, which after a failover becomes the new primary's old self in the other region. App reads can suddenly cross regions β latency spikes silently.
FOG endpoints survive failover; database-level Geo-Rep endpoints don't. This is the single most asked DP-300 question on this topic.
Migration Between Optionsβ
| From β To | Path | Cost |
|---|---|---|
| Geo-Replication β Failover Group | Wrap existing replica into a new FOG referencing same secondary | Online; no reseed; gain auto-failover + endpoint |
| Failover Group β Geo-Replication | Drop FOG; existing geo-replica remains | Online; lose auto-failover + endpoints; app must update DNS |
| FOG (1 DB) β FOG (multi-DB) | Add DBs to existing FOG | Each new DB triggers a one-time seed |
| FOG β Different paired region | Drop + recreate; fresh seed | Compute cost during seed; brief unavailability of failover capability |
| Geo-Rep (sync) β Geo-Rep (async, default) | N/A β Geo-Rep is always async | (Trap question β there is no sync Geo-Rep on SQL DB) |
| Active Geo-Rep + custom DNS β FOG | Configure FOG, retire DNS, app uses FOG endpoints | Days of parallel-running recommended; rollback = bring DNS back |
Moves between the two are cheap (no data reseed in most paths). The one-way pain is changing paired region β forces a fresh secondary.
Real Scenariosβ
- Single SQL DB, 99.99 % SLA, app must auto-recover β Failover Group, grace period 60 min. Driver: app uses FOG endpoint, no code change on failover.
- 3 logically independent SaaS DBs in same server, each with its own app β 3 separate Failover Groups. Driver: each app fails over on its own. Trade-off: 3 secondaries to pay for.
- Reporting workload reading from a stale secondary 200 km away β Active Geo-Replication, manual failover only, app talks to readable secondary directly. Driver: no need for auto-failover; reads off-loaded.
- MI hosting 80 databases for a legacy ERP β MI Failover Group. Driver: only option for MI. Trade-off accepted: all 80 DBs fail over together, no per-DB control.
- App depending on cross-DB queries within one server β One FOG containing all related DBs. Driver: cross-DB queries break if databases fail over independently and end up in different regions.