Lab: Cross-Region AlwaysOn Availability Groups

Extend the single-region AlwaysOn AG into a multi-region setup — deploy a 3rd SQL replica in a different Azure region with async commit, configure distributed AG or multi-subnet AG, and understand the trade-offs between same-region sync commit and cross-region async commit.

DP-300 Domain: 5 — Plan and Configure HA/DR (20-25%)

🏢 Real-World DBA Note

Real customer scenario: your primary AG runs in West Europe with sync commit and automatic failover. A 3rd replica in North Europe provides async DR. If the entire West Europe region goes down, you can manually failover to North Europe with minimal data loss (RPO < 5 seconds under normal load).

Why Cross-Region?

Same-Region vs Cross-Region AG

🏠

Same-Region AG (built in previous lab)

Sync commit, automatic failover

Zero data loss (RPO = 0)

Protects: VM/rack/fault domain failures

Does NOT protect: region-wide outages

🌍

Cross-Region AG (this lab)

Async commit to remote replica

Near-zero data loss (RPO < 5s)

Protects: entire region failures

Manual failover with potential data loss

Architecture

Cross-region AlwaysOn AG lab: West Europe sync pair (sqlvm-ha1 primary + sqlvm-ha2 sync secondary) with North Europe async DR replica via global VNET peering

What Gets Deployed

Component	Resource	Region	Purpose
Primary AG	`sqlvm-ha1` + `sqlvm-ha2`	West Europe	Sync commit pair with automatic failover
DR Replica	`sqlvm-dr1` (10.1.1.10)	North Europe	Async commit for region-level DR
VNET Peering	Global peering	Cross-region	Connects the two VNETs
DR VNET	`vnet-sql-dr` (10.1.0.0/16)	North Europe	Network for DR replica
NSG	`nsg-sql-dr`	North Europe	Firewall for DR subnet

Prerequisites

Completed the AlwaysOn HA Setup lab — the primary 2-node AG must be running
Azure subscription with quota for 1 additional Standard_D2ads_v6 VM
Existing domain sqlha.local running on dc-vm

Lab Steps

Deploy DR Infrastructure

Create VNET in North Europe, deploy SQL VM, configure VNET peering

Domain-Join DR Replica

Configure DNS to use dc-vm, join sqlvm-dr1 to sqlha.local domain

Install Clustering & Add to WSFC

Install failover clustering feature, add node to existing cluster

Add Async Replica to AG

Add sqlvm-dr1 as async commit replica to the existing AG

Test Cross-Region Failover

Force manual failover to DR, measure data loss, failback to primary

Compare Sync vs Async Performance

Measure commit latency, replication lag, and transaction throughput

Step 1: Deploy DR Infrastructure

$drResourceGroup = "rg-sql-ha"       # Same RG or separate — your choice
$drLocation = "northeurope"
$primaryVnet = "vnet-sql-ha"
$primaryRG = "rg-sql-ha"

# Create DR VNET in North Europe
az network vnet create `
    --resource-group $drResourceGroup `
    --name vnet-sql-dr `
    --location $drLocation `
    --address-prefixes 10.1.0.0/16 `
    --subnet-name dr-subnet `
    --subnet-prefixes 10.1.1.0/24

# Create NSG for DR subnet
az network nsg create `
    --resource-group $drResourceGroup `
    --name nsg-sql-dr `
    --location $drLocation

# Allow SQL, HADR mirroring, RDP, and WSFC
az network nsg rule create --resource-group $drResourceGroup --nsg-name nsg-sql-dr `
    --name AllowSQL --priority 100 --direction Inbound --access Allow `
    --protocol Tcp --destination-port-range 1433 --source-address-prefix VirtualNetwork
az network nsg rule create --resource-group $drResourceGroup --nsg-name nsg-sql-dr `
    --name AllowHADR --priority 200 --direction Inbound --access Allow `
    --protocol Tcp --destination-port-range 5022 --source-address-prefix VirtualNetwork
az network nsg rule create --resource-group $drResourceGroup --nsg-name nsg-sql-dr `
    --name AllowRDP --priority 300 --direction Inbound --access Allow `
    --protocol Tcp --destination-port-range 3389 --source-address-prefix '*'
az network nsg rule create --resource-group $drResourceGroup --nsg-name nsg-sql-dr `
    --name AllowWSFC --priority 400 --direction Inbound --access Allow `
    --protocol '*' --destination-port-ranges 135 5985 49152-65535 --source-address-prefix VirtualNetwork

# Associate NSG with DR subnet
az network vnet subnet update `
    --resource-group $drResourceGroup `
    --vnet-name vnet-sql-dr `
    --name dr-subnet `
    --network-security-group nsg-sql-dr

# Deploy DR SQL VM
az vm create `
    --resource-group $drResourceGroup `
    --name sqlvm-dr1 `
    --image MicrosoftSQLServer:sql2022-ws2022:enterprise-gen2:latest `
    --size Standard_D2ads_v6 `
    --admin-username azureadmin `
    --admin-password '<YOUR_PASSWORD>' `
    --vnet-name vnet-sql-dr `
    --subnet dr-subnet `
    --private-ip-address 10.1.1.10 `
    --public-ip-sku Standard `
    --nsg nsg-sql-dr `
    --location $drLocation

Step 2: Configure VNET Peering

# Get VNET resource IDs
$vnet1Id = az network vnet show --resource-group $primaryRG --name $primaryVnet --query id -o tsv
$vnet2Id = az network vnet show --resource-group $drResourceGroup --name vnet-sql-dr --query id -o tsv

# Create peering: Primary → DR
az network vnet peering create `
    --resource-group $primaryRG `
    --name peer-primary-to-dr `
    --vnet-name $primaryVnet `
    --remote-vnet $vnet2Id `
    --allow-vnet-access `
    --allow-forwarded-traffic

# Create peering: DR → Primary
az network vnet peering create `
    --resource-group $drResourceGroup `
    --name peer-dr-to-primary `
    --vnet-name vnet-sql-dr `
    --remote-vnet $vnet1Id `
    --allow-vnet-access `
    --allow-forwarded-traffic

# Verify peering status
az network vnet peering list --resource-group $primaryRG --vnet-name $primaryVnet -o table

⚠️ Watch Out

VNET peering must be created in both directions. The peering status should show Connected on both sides. Traffic won't flow until both sides are Connected.

Step 3: Configure DNS & Domain Join DR Replica

# Update DR VNET DNS to point to the primary DC
az network vnet update `
    --resource-group $drResourceGroup `
    --name vnet-sql-dr `
    --dns-servers 10.0.1.4

# Restart DR VM to pick up new DNS
az vm restart --resource-group $drResourceGroup --name sqlvm-dr1

RDP into sqlvm-dr1 and run:

# Verify DNS resolution to domain controller
nslookup sqlha.local
# Should resolve to 10.0.1.4

# Domain join
Add-Computer -DomainName "sqlha.local" -Credential (Get-Credential SQLHA\azureadmin) -Restart

After reboot, verify:

# Verify domain membership
(Get-WmiObject Win32_ComputerSystem).Domain
# Expected: sqlha.local

Step 4: Install Failover Clustering & Add to WSFC

RDP into sqlvm-dr1:

# Install Failover Clustering feature
Install-WindowsFeature Failover-Clustering -IncludeManagementTools

# Restart
Restart-Computer -Force

After reboot, from any existing cluster node (e.g., sqlvm-ha1):

# Add DR node to existing cluster
Add-ClusterNode -Name "sqlvm-dr1" -Cluster "sqlha-cluster"

# Verify — should show 3 nodes
Get-ClusterNode | Format-Table Name, State, NodeWeight

# Verify quorum — with 3 nodes, File Share Witness gives 4 votes (odd)
Get-ClusterQuorum | Format-List

🎯 Exam Focus

With 3 nodes + 1 File Share Witness = 4 votes. Majority quorum requires 3/4 votes, so you can lose 1 node. If you add a 4th replica later, quorum math changes. Always verify with Get-ClusterQuorum.

Step 5: Add Async Commit Replica to AG

On the current PRIMARY (connect via SSMS):

-- Step 1: Enable Always On on the DR replica first
-- (RDP into sqlvm-dr1, run in elevated PowerShell)
-- Enable-SqlAlwaysOn -ServerInstance "sqlvm-dr1" -Force

-- Step 2: On PRIMARY — add the async replica
ALTER AVAILABILITY GROUP [sqlha-ag]
ADD REPLICA ON N'sqlvm-dr1'
WITH (
    ENDPOINT_URL = N'TCP://sqlvm-dr1.sqlha.local:5022',
    AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,    -- ← KEY DIFFERENCE
    FAILOVER_MODE = MANUAL,                      -- ← Must be manual for async
    SEEDING_MODE = AUTOMATIC,                    -- Automatic initial seeding
    SECONDARY_ROLE (
        ALLOW_CONNECTIONS = READ_ONLY,           -- Read-only routing target
        READ_ONLY_ROUTING_URL = N'TCP://sqlvm-dr1.sqlha.local:1433'
    )
);

On sqlvm-dr1:

-- Join the replica to the AG
ALTER AVAILABILITY GROUP [sqlha-ag] JOIN;

-- Grant AG permission for automatic seeding
ALTER AVAILABILITY GROUP [sqlha-ag] GRANT CREATE ANY DATABASE;

Verify the replica is synchronizing:

-- On PRIMARY — check all replicas
SELECT
    r.replica_server_name,
    r.availability_mode_desc,
    r.failover_mode_desc,
    rs.role_desc,
    rs.synchronization_health_desc,
    rs.connected_state_desc
FROM sys.availability_replicas r
JOIN sys.dm_hadr_availability_replica_states rs
    ON r.replica_id = rs.replica_id
ORDER BY r.replica_server_name;

Expected output:

replica_server_name	availability_mode_desc	failover_mode_desc	role_desc	sync_health
sqlvm-ha1	SYNCHRONOUS_COMMIT	AUTOMATIC	PRIMARY	HEALTHY
sqlvm-ha2	SYNCHRONOUS_COMMIT	AUTOMATIC	SECONDARY	HEALTHY
sqlvm-dr1	ASYNCHRONOUS_COMMIT	MANUAL	SECONDARY	HEALTHY

Step 6: Test Cross-Region Failover

Measure the replication lag first (on PRIMARY):

-- Check send/redo queue sizes for each replica
SELECT
    ag.name AS ag_name,
    r.replica_server_name,
    rs.role_desc,
    drs.database_id,
    drs.synchronization_state_desc,
    drs.log_send_queue_size,         -- KB pending send
    drs.redo_queue_size,              -- KB pending redo on secondary
    drs.log_send_rate,                -- KB/sec being sent
    drs.redo_rate,                    -- KB/sec being applied
    drs.last_commit_time,
    DATEDIFF(SECOND, drs.last_commit_time, GETUTCDATE()) AS lag_seconds
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas r ON drs.replica_id = r.replica_id
JOIN sys.availability_groups ag ON r.group_id = ag.group_id
ORDER BY r.replica_server_name;

Manual failover to DR (forced, with potential data loss):

-- On sqlvm-dr1 — FORCE failover (accepts potential data loss)
ALTER AVAILABILITY GROUP [sqlha-ag] FORCE_FAILOVER_ALLOW_DATA_LOSS;

⚠️ Watch Out

FORCE_FAILOVER_ALLOW_DATA_LOSS is the only way to failover to an async replica. Any transactions not yet replicated are lost. After forced failover, the old primary will be in RESOLVING state and needs to be manually resumed.

After testing, failback to primary region:

-- On the old primary (sqlvm-ha1) — resume after forced failover
ALTER DATABASE [YourDB] SET HADR RESUME;

-- On sqlvm-dr1 (current primary) — failover back to original primary
-- First, wait for sync to catch up, then:
ALTER AVAILABILITY GROUP [sqlha-ag] FAILOVER;  -- Only works if target is sync commit

-- OR if sqlvm-ha1 is still async, force again:
ALTER AVAILABILITY GROUP [sqlha-ag] FORCE_FAILOVER_ALLOW_DATA_LOSS;

Step 7: Compare Sync vs Async Performance

Run this workload test on the PRIMARY to see the commit latency difference:

-- Create test table
CREATE TABLE dbo.LatencyTest (
    Id INT IDENTITY PRIMARY KEY,
    Payload NVARCHAR(200),
    InsertedAt DATETIME2 DEFAULT SYSUTCDATETIME()
);

-- Measure insert throughput (single-row inserts to show commit latency)
DECLARE @start DATETIME2 = SYSUTCDATETIME();
DECLARE @i INT = 0;
WHILE @i < 1000
BEGIN
    INSERT INTO dbo.LatencyTest (Payload)
    VALUES (REPLICATE('X', 200));
    SET @i += 1;
END
SELECT DATEDIFF(MILLISECOND, @start, SYSUTCDATETIME()) AS total_ms,
       DATEDIFF(MILLISECOND, @start, SYSUTCDATETIME()) / 1000.0 AS avg_ms_per_insert;

Compare with the replica lag:

-- Real-time view of replication latency
SELECT
    r.replica_server_name,
    r.availability_mode_desc,
    drs.log_send_queue_size AS send_queue_kb,
    drs.redo_queue_size AS redo_queue_kb,
    CASE
        WHEN drs.last_commit_time IS NULL THEN 'N/A'
        ELSE CAST(DATEDIFF(MILLISECOND, drs.last_commit_time, GETUTCDATE()) AS VARCHAR) + ' ms'
    END AS replication_lag
FROM sys.dm_hadr_database_replica_states drs
JOIN sys.availability_replicas r ON drs.replica_id = r.replica_id
WHERE drs.is_local = 0;

Key Learnings: Same Region vs Cross Region

Aspect	Same Region (Sync)	Cross Region (Async)
Commit mode	Synchronous	Asynchronous
Failover mode	Automatic	Manual only
RPO	0 (zero data loss)	> 0 (depends on lag)
RTO	~10-30 seconds	Manual action + ~30 sec
Network latency	~1ms	~15-30ms (EU regions)
Impact on commits	Slight — must wait for remote harden	None — async fire and forget
Use case	HA within a region	DR across regions

Can an async commit replica use automatic failover?

Click to reveal answer

No. Async replicas can only use MANUAL failover (with FORCE_FAILOVER_ALLOW_DATA_LOSS). Automatic failover requires synchronous commit because the system must guarantee zero data loss.

1 / 3

Common Failures

Symptom	Likely Cause	Fix
Cross-region replica stuck SYNCHRONIZING but high latency	WAN bandwidth bottleneck on log block ship	Confirm async commit; size ExpressRoute / VPN; monitor `log_send_queue_size`
Forced failover lost transactions	Async secondary was behind by N MB of log	Document RPO at this lag level; alert when `log_send_queue_size_kb` > threshold
AG won't fail back: "database not in valid state"	Old primary was force-failed; needs RESUME	`ALTER DATABASE [DB] SET HADR RESUME` on old primary
Listener returns wrong IP after region failover	DNS TTL too long; client cached old IP	Lower TTL on listener DNS to 30s; add `MultiSubnetFailover=True` to connection string
Cluster split-brain after WAN partition	No witness or witness in same partition	Place Cloud Witness in a 3rd region

⚠️ Watch Out

Async commit + automatic failover = forbidden combo. SQL won't let you set both. Auto-failover requires sync commit, which requires acceptable WAN latency (< 10ms ideally). Cross-region is almost always async + manual failover.

Variations to Try

Add a second async replica in a 3rd region for additional DR redundancy
Convert to distributed AG to chain two AGs across regions (no single-cluster span)
Use Read-only routing so cross-region reporting workloads land on the DR replica
Place backups on async secondary to keep them out of the primary region
Test forced failover quarterly, document RPO/RTO observed

Production Application

DR for tier-1 OLTP when PaaS Failover Group isn't viable (legacy SQL features, CLR, agent jobs).
Regulated workload requiring cross-region recovery without leaving IaaS (custom encryption, third-party agents).
Active-passive cross-region — keep DR cold-warm via async; pay only for VM compute, not duplicated PaaS S-tier.
Anti-pattern: don't pick this for sub-second cross-region failover — use a distributed system (Cosmos DB, etc.) at the data tier instead.

Clean Up

# Remove the DR replica from AG (on current PRIMARY)
# ALTER AVAILABILITY GROUP [sqlha-ag] REMOVE REPLICA ON N'sqlvm-dr1';

# Remove DR node from cluster
# Remove-ClusterNode -Name "sqlvm-dr1" -Force

# Delete DR VM
az vm delete --resource-group rg-sql-ha --name sqlvm-dr1 --yes

# Remove peering
az network vnet peering delete --resource-group rg-sql-ha --name peer-primary-to-dr --vnet-name vnet-sql-ha
az network vnet peering delete --resource-group rg-sql-ha --name peer-dr-to-primary --vnet-name vnet-sql-dr

# Delete DR VNET and NSG
az network vnet delete --resource-group rg-sql-ha --name vnet-sql-dr
az network nsg delete --resource-group rg-sql-ha --name nsg-sql-dr

What's Next?

Direction	Lab
Try managed HA instead of VMs	SQL MI Flavors & Replicas
Compare PaaS DR options	Geo-Replication & Failover
Try all migration paths	Complete Migration Scenarios

Why Cross-Region?​

Architecture​

What Gets Deployed​

Prerequisites​

Lab Steps​

Step 1: Deploy DR Infrastructure​

Step 2: Configure VNET Peering​

Step 3: Configure DNS & Domain Join DR Replica​

Step 4: Install Failover Clustering & Add to WSFC​

Step 5: Add Async Commit Replica to AG​

Step 6: Test Cross-Region Failover​

Step 7: Compare Sync vs Async Performance​

Key Learnings: Same Region vs Cross Region​

Common Failures​

Variations to Try​

Production Application​

Clean Up​

What's Next?​

Why Cross-Region?

Architecture

What Gets Deployed

Prerequisites

Lab Steps

Step 1: Deploy DR Infrastructure

Step 2: Configure VNET Peering

Step 3: Configure DNS & Domain Join DR Replica

Step 4: Install Failover Clustering & Add to WSFC

Step 5: Add Async Commit Replica to AG

Step 6: Test Cross-Region Failover

Step 7: Compare Sync vs Async Performance

Key Learnings: Same Region vs Cross Region

Common Failures

Variations to Try

Production Application

Clean Up

What's Next?