Failover Groups vs Active Geo-Replication
Both provide cross-region disaster recovery for Azure SQL — but they work differently and suit different scenarios. This page makes the choice clear.
Side-by-Side Comparison
| Aspect | Active Geo-Replication | Auto-Failover Groups |
|---|---|---|
| What it is | Async readable secondary in any region | Managed failover with single endpoint |
| Max secondaries | 4 | 1 partner (primary ↔ secondary) |
| Auto failover | ❌ Manual only | ✅ Automatic (with grace period) |
| Single endpoint | ❌ Each secondary has its own | ✅ fog.database.windows.net (never changes) |
| Read-only endpoint | ✅ Per secondary | ✅ fog.secondary.database.windows.net |
| Scope | Per database | Multiple databases (group) |
| Failover unit | One database at a time | All databases in the group together |
| DNS update on failover | ❌ Must update connection strings | ✅ Automatic DNS flip |
| Grace period | N/A | Configurable (default 60 min) |
| Works with SQL DB | ✅ | ✅ |
| Works with MI | ❌ | ✅ (all user DBs, all-or-nothing) |
| Replication | Async | Async |
| RPO | < 5 seconds | < 5 seconds |
| RTO | Manual trigger (~30 sec) | Grace period + ~30 sec |
| Cost | Secondary DB billed | Secondary DB billed |
What Each Option Is
Active Geo-Replication
A per-database feature that creates an asynchronous readable copy in another region. You manage failover manually. Each secondary has its own connection string.
Think of it as: "I want read replicas in other regions, and I'll handle failover myself."
Auto-Failover Groups
A managed group of databases with automatic failover, a single DNS endpoint that never changes, and built-in read/write + read-only listeners.
Think of it as: "I want Azure to handle everything — one endpoint, automatic failover, multiple databases fail over together."
Endpoint Behavior
This is the biggest practical difference and the most tested on DP-300.
Geo-Replication Endpoints
| Database | Endpoint |
|---|---|
| Primary | server1.database.windows.net |
| Secondary 1 | server2-region2.database.windows.net |
| Secondary 2 | server3-region3.database.windows.net |
Problem: On failover, your app must update its connection string to point to the new primary.
Failover Group Endpoints
| Endpoint | Points To | Changes on Failover? |
|---|---|---|
fog-name.database.windows.net | Current primary | ❌ Never changes |
fog-name.secondary.database.windows.net | Current secondary | ❌ Never changes |
Advantage: Your app connects to the fog endpoint and never needs to change connection strings, even after failover. DNS flips automatically.
DP-300 loves this question: "How do you ensure application connection strings don't change during failover?" → Auto-Failover Groups. The fog endpoint abstracts the primary/secondary — DNS updates happen automatically behind the scenes.
Failover Behavior
Geo-Replication Failover
- You detect the outage (or Azure notifies you)
- You manually trigger failover to a specific secondary
- That secondary becomes the new primary
- You update application connection strings to the new server
- The old primary (when recovered) becomes a secondary
Failover Group Failover
- Azure detects the outage
- Grace period passes (default 60 min — configurable)
- Automatic failover triggers (or you trigger manually before grace period)
- DNS endpoint
fog.database.windows.netautomatically points to the new primary - Applications don't need any changes
- The old primary (when recovered) automatically becomes the secondary
Grace period trap: The default grace period is 60 minutes. This means Azure waits 60 minutes before automatically failing over. During this time, the database is unavailable. You can set it as low as 1 hour or trigger manual failover immediately. Set this based on your RTO requirement.
Scope: Per-Database vs Group
| Scenario | Geo-Replication | Failover Groups |
|---|---|---|
| Fail over one database | ✅ | ❌ (all or nothing) |
| Fail over 10 databases together consistently | ❌ (10 separate failovers) | ✅ (one action) |
| Different databases in different regions | ✅ (flexible) | ❌ (one partner region) |
| Mix of databases with and without DR | ✅ (per DB choice) | ❌ (all in group) |
Production pattern: Use Failover Groups for your core application databases that must fail over together (consistency). Use Geo-Replication for analytics/reporting replicas that don't need automatic failover.
Managed Instance Differences
| Feature | SQL DB | MI |
|---|---|---|
| Geo-Replication | ✅ | ❌ Not supported |
| Failover Groups | ✅ | ✅ |
| FG scope | Selected databases | ALL user databases (all-or-nothing) |
| FG limit | Multiple groups per server | 1 failover group per MI |
MI critical facts: MI does NOT support Active Geo-Replication — only Failover Groups. And MI Failover Groups replicate ALL user databases — you can't choose which ones. Only 1 failover group per MI.
Choose This When...
| Requirement | Choose |
|---|---|
| "Connection strings must never change on failover" | Failover Groups |
| "Failover must be automatic" | Failover Groups |
| "I need readable replicas in 3+ regions" | Geo-Replication (up to 4 secondaries) |
| "Multiple databases must fail over as a unit" | Failover Groups |
| "I want control over which databases have DR" | Geo-Replication (per database) |
| "I use Managed Instance" | Failover Groups (only option) |
| "I need the simplest setup" | Failover Groups |
| "I need read locality in multiple regions" | Geo-Replication (secondary per region) |
| "Recommended by Microsoft for production DR" | Failover Groups |
Common Misconceptions
| Misconception | Reality |
|---|---|
| "Geo-Replication has automatic failover" | ❌ Manual only. Failover Groups have automatic. |
| "Failover Groups support 4 secondaries" | ❌ Only 1 partner. Geo-Replication supports up to 4. |
| "I can choose which MI databases to replicate" | ❌ MI Failover Groups replicate ALL user databases. |
| "Failover is instant" | ❌ Grace period (default 60 min) + ~30 sec failover. |
| "Both protect against accidental DELETE" | ❌ Neither does. Deletion replicates. Use PITR. |
| "Geo-Replication works on MI" | ❌ MI only supports Failover Groups. |
| "Failover Group DNS takes hours to update" | ❌ DNS TTL is 30 seconds. Failover is fast once triggered. |
| "I need both for full DR" | ❌ Usually one or the other. FG is sufficient for most production scenarios. |
RPO/RTO Summary
| Solution | RPO | RTO |
|---|---|---|
| Geo-Replication (manual failover) | < 5 sec | Manual trigger + ~30 sec |
| Failover Groups (auto) | < 5 sec | Grace period + ~30 sec |
| Failover Groups (manual) | < 5 sec | ~30 sec |
| PITR (for comparison) | 5-10 min | Hours |
Both have identical RPO (< 5 seconds) because both use async replication. The RTO difference is the grace period — Geo-Rep depends on how fast you react, Failover Groups wait for the grace period then auto-failover.