Skip to main content

Lab: Active Geo-Replication & Auto-Failover Groups

Deploy an Azure SQL Database with active geo-replication across two regions, configure an auto-failover group with a grace period, trigger failover, and measure real-world RPO/RTO.

DP-300 Domain: 5 — Plan and Configure HA/DR (20-25%)

Architecture

Geo-replication & auto-failover groups lab architecture: West Europe primary, North Europe async secondary, fg-geolab listener, application traffic

Geo-Replication vs Failover Groups

When to Use Which
🌍
Active Geo-Replication
Up to 4 readable secondaries in any region. Manual failover only. Per-database level. Use for: read offload, custom failover logic.
🔄
Auto-Failover Groups
One secondary region. Automatic + manual failover. Server-level (all DBs fail over together). Use for: production DR with auto-recovery.

What Gets Deployed

ComponentResourceRegion
Primary SQL Serversql-geo-primary-<random>West Europe
Secondary SQL Serversql-geo-secondary-<random>North Europe
DatabaseGeoLabDB (S1 tier)Both (replicated)
Failover Groupfg-geolab-<random>Cross-region

Lab Steps

Deploy Primary Database
Create Azure SQL Server + database in West Europe with sample data
Create Geo-Replica
Deploy secondary server in North Europe, configure active geo-replication
Configure Failover Group
Create auto-failover group with grace period and read/write listener
Test Failover
Trigger manual failover, verify data consistency, measure RTO
Test Read-Only Routing
Connect to read-only listener, verify queries route to secondary
Simulate Outage Recovery
Force failover, verify automatic recovery, measure RPO

Step 1: Deploy Primary Database

$resourceGroup = "rg-geo-lab"
$primaryLocation = "westeurope"
$secondaryLocation = "northeurope"
$suffix = Get-Random -Maximum 9999
$primaryServer = "sql-geo-primary-$suffix"
$secondaryServer = "sql-geo-secondary-$suffix"
$failoverGroup = "fg-geolab-$suffix"
$adminPassword = "<YOUR_STRONG_PASSWORD>"

# Create resource group
az group create --name $resourceGroup --location $primaryLocation

# Create primary SQL server
az sql server create `
--resource-group $resourceGroup `
--name $primaryServer `
--admin-user sqladmin `
--admin-password $adminPassword `
--location $primaryLocation

# Allow Azure services (for cross-region replication)
az sql server firewall-rule create `
--resource-group $resourceGroup `
--server $primaryServer `
--name AllowAzureServices `
--start-ip-address 0.0.0.0 `
--end-ip-address 0.0.0.0

# Allow your IP
$myIp = (Invoke-RestMethod -Uri https://api.ipify.org)
az sql server firewall-rule create `
--resource-group $resourceGroup `
--server $primaryServer `
--name AllowMyIP `
--start-ip-address $myIp `
--end-ip-address $myIp

# Create database with sample data
az sql db create `
--resource-group $resourceGroup `
--server $primaryServer `
--name GeoLabDB `
--service-objective S1 `
--sample-name AdventureWorksLT

Step 2: Create Geo-Replica

# Create secondary SQL server in another region
az sql server create `
--resource-group $resourceGroup `
--name $secondaryServer `
--admin-user sqladmin `
--admin-password $adminPassword `
--location $secondaryLocation

# Allow Azure services on secondary
az sql server firewall-rule create `
--resource-group $resourceGroup `
--server $secondaryServer `
--name AllowAzureServices `
--start-ip-address 0.0.0.0 `
--end-ip-address 0.0.0.0

# Allow your IP on secondary
az sql server firewall-rule create `
--resource-group $resourceGroup `
--server $secondaryServer `
--name AllowMyIP `
--start-ip-address $myIp `
--end-ip-address $myIp

# Create active geo-replication link
az sql db replica create `
--resource-group $resourceGroup `
--server $primaryServer `
--name GeoLabDB `
--partner-server $secondaryServer `
--partner-resource-group $resourceGroup `
--secondary-type Geo

Verify replication is active:

az sql db replica list-links `
--resource-group $resourceGroup `
--server $primaryServer `
--name GeoLabDB `
--query "[].{Role:role, State:replicationState, Partner:partnerServer}" `
-o table

Expected:

Role      State        Partner
-------- ----------- ----------------------
Primary CATCH_UP sql-geo-secondary-1234
⚠️ Watch Out

CATCH_UP means the secondary is still syncing. Wait until it shows SEEDINGCATCH_UPthe replica is usable once it shows up in the portal as "Online". For a small S1 database, this takes 1-3 minutes.

Step 3: Configure Auto-Failover Group

# Create failover group
az sql failover-group create `
--resource-group $resourceGroup `
--server $primaryServer `
--name $failoverGroup `
--partner-server $secondaryServer `
--partner-resource-group $resourceGroup `
--failover-policy Automatic `
--grace-period 60

# Check failover group status
az sql failover-group show `
--resource-group $resourceGroup `
--server $primaryServer `
--name $failoverGroup `
--query "{Name:name, Role:replicationRole, GracePeriod:readWriteEndpoint.failoverPolicy}" `
-o table

Connection strings (use these in applications):

EndpointDNS NamePurpose
Read-Writefg-geolab-<suffix>.database.windows.netRoutes to current primary
Read-Onlyfg-geolab-<suffix>.secondary.database.windows.netRoutes to secondary replica

Step 4: Test Manual Failover

# Insert a timestamp row on primary (for RPO measurement)
$timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
Write-Host "Pre-failover timestamp: $timestamp"

# Trigger planned failover (no data loss)
az sql failover-group set-primary `
--resource-group $resourceGroup `
--server $secondaryServer `
--name $failoverGroup

# Verify roles swapped
az sql failover-group show `
--resource-group $resourceGroup `
--server $secondaryServer `
--name $failoverGroup `
--query "{Name:name, Role:replicationRole}" `
-o table

Expected: The secondary server is now Primary.

Step 5: Test Read-Only Routing

-- Connect to the read-only endpoint:
-- fg-geolab-<suffix>.secondary.database.windows.net

-- This query should work (reads are allowed)
SELECT TOP 10 CustomerID, CompanyName
FROM SalesLT.Customer
ORDER BY CustomerID;

-- This query should FAIL (writes are blocked on read-only replica)
INSERT INTO SalesLT.Customer (FirstName, LastName, CompanyName)
VALUES ('Test', 'User', 'Failover Test Inc.');
-- Error: Failed to update database because the database is read-only.
🏢 Real-World DBA Note

Use read-only routing to offload reporting queries from the primary. In customer scenarios, BI dashboards connect to the secondary endpoint, keeping the primary available for transactional workloads.

Step 6: Fail Back to Original Primary

# Fail back to original primary
az sql failover-group set-primary `
--resource-group $resourceGroup `
--server $primaryServer `
--name $failoverGroup

# Verify we're back
az sql failover-group show `
--resource-group $resourceGroup `
--server $primaryServer `
--name $failoverGroup `
--query "{Primary:replicationRole, GracePeriod:readWriteEndpoint.failoverWithDataLossGracePeriod}" `
-o table

Measuring RPO and RTO

MetricPlanned FailoverForced Failover
RPO0 (no data loss)Depends on replication lag (typically < 5 sec)
RTO~30-60 secondsGrace period + recovery (60 min default → configurable)
DNS TTL30 secondsApplication reconnects to same listener URL
-- Check replication lag (run on primary)
SELECT
ag.partner_server,
ag.replication_state_desc,
ag.last_replication,
DATEDIFF(SECOND, ag.last_replication, GETUTCDATE()) AS lag_seconds
FROM sys.dm_geo_replication_link_status ag;

Verification Checklist

CheckExpectedCommand
Failover group existsfg-geolab-*az sql failover-group list ...
Read-write routes to primaryQueries succeedConnect to .database.windows.net
Read-only routes to secondaryReads succeed, writes failConnect to .secondary.database.windows.net
Planned failover completesRoles swap, 0 data lossaz sql failover-group set-primary
Failback completesOriginal roles restoredSame command, original server

Common Failures

SymptomLikely CauseFix
Failover Group create fails: "server not empty on secondary region"DB with same name exists on target serverDrop or rename target DB; FOG requires empty target
Geo-secondary stuck CATCH_UP foreverInitial seed still copying (large DB)Wait — ~150 GB/hr typical; monitor sys.dm_geo_replication_link_status
App connects to read-only listener and gets read-write errorsConnection string uses RW listener for read trafficSplit: use <fog>.database.windows.net (RW) and <fog>.secondary.database.windows.net (RO)
Auto-failover never triggers despite outageGrace period (default 1h) not yet elapsedTune --failover-policy Automatic --grace-period-with-data-loss-hours 1
Forced failover loses transactionsAsync replication lag at moment of failoverTrack lag via replication_lag_sec; tune SLA
⚠️ Watch Out

Failover Group requires same service tier on both servers. Mixing General Purpose primary + Hyperscale secondary will reject FOG creation. Match tiers, then upgrade in lockstep.

Variations to Try

  • Test planned failover vs forced failover — observe difference in replication_lag_sec
  • Add a 3rd geo-secondary via active geo-replication (FOG supports up to 4 secondaries since 2024)
  • Wire Action Group + alert on replication_lag_sec > 60s
  • Drop the auto-failover policy temporarily and use manual failover only for compliance scenarios
  • Pair FOG with LTR geo-redundant for layered DR + retention

Production Application

  • PaaS SaaS app, multi-region active-passive, RTO < 60 sec → FOG with auto-failover, grace period 1h.
  • Compliance-bound: explicit operator-driven failover only → FOG with manual failover policy.
  • Multi-tenant SaaS, per-tenant database → FOG covers all DBs on the server in one flip.
  • Anti-pattern: don't use FOG when you need sub-second RPO across regions — SQL is async cross-region. Use app-level dual-write or eventual consistency designs.

Clean Up

az group delete --name rg-geo-lab --yes --no-wait

Key Takeaways

What is the key difference between active geo-replication and auto-failover groups?
Click to reveal answer
**Geo-replication**: per-database, up to 4 secondaries, manual failover only, any region. **Failover groups**: server-level (all DBs together), 1 secondary, automatic failover with grace period, built-in read/write + read-only DNS listeners.
1 / 4