Lab: Active Geo-Replication & Auto-Failover Groups

Deploy an Azure SQL Database with active geo-replication across two regions, configure an auto-failover group with a grace period, trigger failover, and measure real-world RPO/RTO.

DP-300 Domain: 5 — Plan and Configure HA/DR (20-25%)

Architecture

Geo-replication & auto-failover groups lab architecture: West Europe primary, North Europe async secondary, fg-geolab listener, application traffic

Geo-Replication vs Failover Groups

When to Use Which

🌍

Active Geo-Replication

Up to 4 readable secondaries in any region. Manual failover only. Per-database level. Use for: read offload, custom failover logic.

🔄

Auto-Failover Groups

One secondary region. Automatic + manual failover. Server-level (all DBs fail over together). Use for: production DR with auto-recovery.

What Gets Deployed

Component	Resource	Region
Primary SQL Server	`sql-geo-primary-<random>`	West Europe
Secondary SQL Server	`sql-geo-secondary-<random>`	North Europe
Database	`GeoLabDB` (S1 tier)	Both (replicated)
Failover Group	`fg-geolab-<random>`	Cross-region

Lab Steps

Deploy Primary Database

Create Azure SQL Server + database in West Europe with sample data

Create Geo-Replica

Deploy secondary server in North Europe, configure active geo-replication

Configure Failover Group

Create auto-failover group with grace period and read/write listener

Test Failover

Trigger manual failover, verify data consistency, measure RTO

Test Read-Only Routing

Connect to read-only listener, verify queries route to secondary

Simulate Outage Recovery

Force failover, verify automatic recovery, measure RPO

Step 1: Deploy Primary Database

$resourceGroup = "rg-geo-lab"
$primaryLocation = "westeurope"
$secondaryLocation = "northeurope"
$suffix = Get-Random -Maximum 9999
$primaryServer = "sql-geo-primary-$suffix"
$secondaryServer = "sql-geo-secondary-$suffix"
$failoverGroup = "fg-geolab-$suffix"
$adminPassword = "<YOUR_STRONG_PASSWORD>"

# Create resource group
az group create --name $resourceGroup --location $primaryLocation

# Create primary SQL server
az sql server create `
    --resource-group $resourceGroup `
    --name $primaryServer `
    --admin-user sqladmin `
    --admin-password $adminPassword `
    --location $primaryLocation

# Allow Azure services (for cross-region replication)
az sql server firewall-rule create `
    --resource-group $resourceGroup `
    --server $primaryServer `
    --name AllowAzureServices `
    --start-ip-address 0.0.0.0 `
    --end-ip-address 0.0.0.0

# Allow your IP
$myIp = (Invoke-RestMethod -Uri https://api.ipify.org)
az sql server firewall-rule create `
    --resource-group $resourceGroup `
    --server $primaryServer `
    --name AllowMyIP `
    --start-ip-address $myIp `
    --end-ip-address $myIp

# Create database with sample data
az sql db create `
    --resource-group $resourceGroup `
    --server $primaryServer `
    --name GeoLabDB `
    --service-objective S1 `
    --sample-name AdventureWorksLT

Step 2: Create Geo-Replica

# Create secondary SQL server in another region
az sql server create `
    --resource-group $resourceGroup `
    --name $secondaryServer `
    --admin-user sqladmin `
    --admin-password $adminPassword `
    --location $secondaryLocation

# Allow Azure services on secondary
az sql server firewall-rule create `
    --resource-group $resourceGroup `
    --server $secondaryServer `
    --name AllowAzureServices `
    --start-ip-address 0.0.0.0 `
    --end-ip-address 0.0.0.0

# Allow your IP on secondary
az sql server firewall-rule create `
    --resource-group $resourceGroup `
    --server $secondaryServer `
    --name AllowMyIP `
    --start-ip-address $myIp `
    --end-ip-address $myIp

# Create active geo-replication link
az sql db replica create `
    --resource-group $resourceGroup `
    --server $primaryServer `
    --name GeoLabDB `
    --partner-server $secondaryServer `
    --partner-resource-group $resourceGroup `
    --secondary-type Geo

Verify replication is active:

az sql db replica list-links `
    --resource-group $resourceGroup `
    --server $primaryServer `
    --name GeoLabDB `
    --query "[].{Role:role, State:replicationState, Partner:partnerServer}" `
    -o table

Expected:

Role      State        Partner
--------  -----------  ----------------------
Primary   CATCH_UP     sql-geo-secondary-1234

⚠️ Watch Out

CATCH_UP means the secondary is still syncing. Wait until it shows SEEDING → CATCH_UP → the replica is usable once it shows up in the portal as "Online". For a small S1 database, this takes 1-3 minutes.

Step 3: Configure Auto-Failover Group

# Create failover group
az sql failover-group create `
    --resource-group $resourceGroup `
    --server $primaryServer `
    --name $failoverGroup `
    --partner-server $secondaryServer `
    --partner-resource-group $resourceGroup `
    --failover-policy Automatic `
    --grace-period 60

# Check failover group status
az sql failover-group show `
    --resource-group $resourceGroup `
    --server $primaryServer `
    --name $failoverGroup `
    --query "{Name:name, Role:replicationRole, GracePeriod:readWriteEndpoint.failoverPolicy}" `
    -o table

Connection strings (use these in applications):

Endpoint	DNS Name	Purpose
Read-Write	`fg-geolab-<suffix>.database.windows.net`	Routes to current primary
Read-Only	`fg-geolab-<suffix>.secondary.database.windows.net`	Routes to secondary replica

Step 4: Test Manual Failover

# Insert a timestamp row on primary (for RPO measurement)
$timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
Write-Host "Pre-failover timestamp: $timestamp"

# Trigger planned failover (no data loss)
az sql failover-group set-primary `
    --resource-group $resourceGroup `
    --server $secondaryServer `
    --name $failoverGroup

# Verify roles swapped
az sql failover-group show `
    --resource-group $resourceGroup `
    --server $secondaryServer `
    --name $failoverGroup `
    --query "{Name:name, Role:replicationRole}" `
    -o table

Expected: The secondary server is now Primary.

Step 5: Test Read-Only Routing

-- Connect to the read-only endpoint:
-- fg-geolab-<suffix>.secondary.database.windows.net

-- This query should work (reads are allowed)
SELECT TOP 10 CustomerID, CompanyName
FROM SalesLT.Customer
ORDER BY CustomerID;

-- This query should FAIL (writes are blocked on read-only replica)
INSERT INTO SalesLT.Customer (FirstName, LastName, CompanyName)
VALUES ('Test', 'User', 'Failover Test Inc.');
-- Error: Failed to update database because the database is read-only.

🏢 Real-World DBA Note

Use read-only routing to offload reporting queries from the primary. In customer scenarios, BI dashboards connect to the secondary endpoint, keeping the primary available for transactional workloads.

Step 6: Fail Back to Original Primary

# Fail back to original primary
az sql failover-group set-primary `
    --resource-group $resourceGroup `
    --server $primaryServer `
    --name $failoverGroup

# Verify we're back
az sql failover-group show `
    --resource-group $resourceGroup `
    --server $primaryServer `
    --name $failoverGroup `
    --query "{Primary:replicationRole, GracePeriod:readWriteEndpoint.failoverWithDataLossGracePeriod}" `
    -o table

Measuring RPO and RTO

Metric	Planned Failover	Forced Failover
RPO	0 (no data loss)	Depends on replication lag (typically < 5 sec)
RTO	~30-60 seconds	Grace period + recovery (60 min default → configurable)
DNS TTL	30 seconds	Application reconnects to same listener URL

-- Check replication lag (run on primary)
SELECT
    ag.partner_server,
    ag.replication_state_desc,
    ag.last_replication,
    DATEDIFF(SECOND, ag.last_replication, GETUTCDATE()) AS lag_seconds
FROM sys.dm_geo_replication_link_status ag;

Verification Checklist

Check	Expected	Command
Failover group exists	`fg-geolab-*`	`az sql failover-group list ...`
Read-write routes to primary	Queries succeed	Connect to `.database.windows.net`
Read-only routes to secondary	Reads succeed, writes fail	Connect to `.secondary.database.windows.net`
Planned failover completes	Roles swap, 0 data loss	`az sql failover-group set-primary`
Failback completes	Original roles restored	Same command, original server

Common Failures

Symptom	Likely Cause	Fix
Failover Group create fails: "server not empty on secondary region"	DB with same name exists on target server	Drop or rename target DB; FOG requires empty target
Geo-secondary stuck CATCH_UP forever	Initial seed still copying (large DB)	Wait — ~150 GB/hr typical; monitor `sys.dm_geo_replication_link_status`
App connects to read-only listener and gets read-write errors	Connection string uses RW listener for read traffic	Split: use `<fog>.database.windows.net` (RW) and `<fog>.secondary.database.windows.net` (RO)
Auto-failover never triggers despite outage	Grace period (default 1h) not yet elapsed	Tune `--failover-policy Automatic --grace-period-with-data-loss-hours 1`
Forced failover loses transactions	Async replication lag at moment of failover	Track lag via `replication_lag_sec`; tune SLA

⚠️ Watch Out

Failover Group requires same service tier on both servers. Mixing General Purpose primary + Hyperscale secondary will reject FOG creation. Match tiers, then upgrade in lockstep.

Variations to Try

Test planned failover vs forced failover — observe difference in replication_lag_sec
Add a 3rd geo-secondary via active geo-replication (FOG supports up to 4 secondaries since 2024)
Wire Action Group + alert on replication_lag_sec > 60s
Drop the auto-failover policy temporarily and use manual failover only for compliance scenarios
Pair FOG with LTR geo-redundant for layered DR + retention

Production Application

PaaS SaaS app, multi-region active-passive, RTO < 60 sec → FOG with auto-failover, grace period 1h.
Compliance-bound: explicit operator-driven failover only → FOG with manual failover policy.
Multi-tenant SaaS, per-tenant database → FOG covers all DBs on the server in one flip.
Anti-pattern: don't use FOG when you need sub-second RPO across regions — SQL is async cross-region. Use app-level dual-write or eventual consistency designs.

Clean Up

az group delete --name rg-geo-lab --yes --no-wait

Key Takeaways

What is the key difference between active geo-replication and auto-failover groups?

Click to reveal answer

**Geo-replication**: per-database, up to 4 secondaries, manual failover only, any region. **Failover groups**: server-level (all DBs together), 1 secondary, automatic failover with grace period, built-in read/write + read-only DNS listeners.

1 / 4

Architecture​

Geo-Replication vs Failover Groups​

What Gets Deployed​

Lab Steps​

Step 1: Deploy Primary Database​

Step 2: Create Geo-Replica​

Step 3: Configure Auto-Failover Group​

Step 4: Test Manual Failover​

Step 5: Test Read-Only Routing​

Step 6: Fail Back to Original Primary​

Measuring RPO and RTO​

Verification Checklist​

Common Failures​

Variations to Try​

Production Application​

Clean Up​

Key Takeaways​