Lab: SQL Server Always On Availability Groups on Azure VMs

Complete, production-ready guide — deploy a 2-node SQL Server 2022 Always On AG on Azure VMs from scratch, with automatic failover, zero data loss, Internal Load Balancer, and a real-time failover testing web app.

DP-300 Domain: 5 — Plan and Configure HA/DR (20-25%)

Architecture

AlwaysOn HA setup lab: 2-node SQL 2022 AG with WSFC, ILB-fronted listener, dc-vm hosting AD DS + DNS + File Share Witness + Flask test app

What Gets Deployed

Component	Resource	Purpose
VNET	`vnet-sql-ha` (10.0.0.0/16)	Network isolation
NSG	`nsg-sql-ha`	Firewall rules for RDP, SQL, HADR
Domain Controller	`dc-vm` (10.0.1.4)	AD DS + DNS + File Share Witness
SQL VM 1	`sqlvm-ha1` (10.0.1.10)	SQL Server 2022 Enterprise replica
SQL VM 2	`sqlvm-ha2` (10.0.1.11)	SQL Server 2022 Enterprise replica
Load Balancer	`ilb-sqlha` (10.0.1.30)	Routes SQL traffic to AG primary
Flask App	`dc-vm:8080`	Real-time failover testing dashboard

Lab Steps (7 Parts)

Deploy Infrastructure

Bicep template deploys VNET, NSG, 3 VMs (1 DC + 2 SQL) with static IPs

Install Active Directory

Promote dc-vm to Domain Controller, configure DNS, join SQL VMs to domain

Configure DNS & Domain Join

Point VNET DNS to DC, domain-join both SQL VMs, configure service accounts

Install Failover Clustering

Install WSFC feature on both SQL nodes, validate cluster prerequisites

Create Cluster & Quorum

Create WSFC cluster with static IP, configure File Share Witness for quorum

Configure Quorum

Set quorum mode, verify voting, test witness accessibility

Enable Always On & Create AG

Enable AG feature, create AG with sync commit + automatic failover, configure ILB listener

Step 1: Deploy Infrastructure

# Variables
$resourceGroup = "rg-sql-ha"
$location = "westeurope"

# Create resource group
az group create --name $resourceGroup --location $location

# Deploy infrastructure (2 SQL VMs + networking)
az deployment group create `
    --resource-group $resourceGroup `
    --template-file sql-ha-vms.bicep `
    --parameters location=$location adminUsername=azureadmin adminPassword='<YOUR_PASSWORD>'

# Deploy Domain Controller VM
az vm create `
    --resource-group $resourceGroup `
    --name dc-vm `
    --image Win2022Datacenter `
    --size Standard_D2ads_v6 `
    --admin-username azureadmin `
    --admin-password '<YOUR_PASSWORD>' `
    --vnet-name vnet-sql-ha `
    --subnet sql-subnet `
    --private-ip-address 10.0.1.4 `
    --public-ip-sku Standard `
    --nsg nsg-sql-ha

Step 2: Install Active Directory

RDP into dc-vm and run:

# Install AD DS
Install-WindowsFeature AD-Domain-Services -IncludeManagementTools

# Promote to Domain Controller
Install-ADDSForest `
    -DomainName "sqlha.local" `
    -DomainNetBIOSName "SQLHA" `
    -InstallDns `
    -SafeModeAdministratorPassword (ConvertTo-SecureString "<DSRM_PASSWORD>" -AsPlainText -Force) `
    -Force

Step 3: Configure DNS & Domain Join

# Update VNET DNS to point to DC
az network vnet update `
    --resource-group rg-sql-ha `
    --name vnet-sql-ha `
    --dns-servers 10.0.1.4

# Restart SQL VMs to pick up new DNS
az vm restart --resource-group rg-sql-ha --name sqlvm-ha1
az vm restart --resource-group rg-sql-ha --name sqlvm-ha2

On each SQL VM:

# Join domain
Add-Computer -DomainName "sqlha.local" -Credential (Get-Credential SQLHA\azureadmin) -Restart

Step 4: Install Failover Clustering

On both SQL VMs:

Install-WindowsFeature Failover-Clustering -IncludeManagementTools

Step 5: Create Cluster

From either SQL VM:

# Validate cluster prerequisites
Test-Cluster -Node sqlvm-ha1,sqlvm-ha2

# Create cluster (no shared storage in Azure)
New-Cluster -Name sqlha-cluster `
    -Node sqlvm-ha1,sqlvm-ha2 `
    -StaticAddress 10.0.1.20 `
    -NoStorage

⚠️ Watch Out

Always use -NoStorage. Azure VMs don't share storage for traditional FCI. Always On AG uses database-level replication instead of shared disks.

On dc-vm:

# Create witness share
New-Item -Path C:\ClusterWitness -ItemType Directory
New-SmbShare -Name ClusterWitness -Path C:\ClusterWitness -FullAccess "SQLHA\sqlha-cluster$"

From a SQL VM:

Set-ClusterQuorum -FileShareWitness "\\dc-vm\ClusterWitness"

Step 7: Enable Always On & Create AG

# Enable AG on both SQL instances
Enable-SqlAlwaysOn -ServerInstance "sqlvm-ha1" -Force
Enable-SqlAlwaysOn -ServerInstance "sqlvm-ha2" -Force

# Create endpoints, AG, and listener via T-SQL (run on primary)

-- Create database
CREATE DATABASE TestHADB;
ALTER DATABASE TestHADB SET RECOVERY FULL;
BACKUP DATABASE TestHADB TO DISK = 'NUL';

-- Create AG endpoint on both nodes (run on each)
CREATE ENDPOINT [Hadr_endpoint]
    STATE = STARTED
    AS TCP (LISTENER_PORT = 5022)
    FOR DATABASE_MIRRORING (ROLE = ALL);

-- Create Availability Group (run on primary)
CREATE AVAILABILITY GROUP [sqlha-ag]
WITH (AUTOMATED_BACKUP_PREFERENCE = PRIMARY)
FOR DATABASE [TestHADB]
REPLICA ON
    N'sqlvm-ha1' WITH (
        ENDPOINT_URL = N'TCP://sqlvm-ha1.sqlha.local:5022',
        FAILOVER_MODE = AUTOMATIC,
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT),
    N'sqlvm-ha2' WITH (
        ENDPOINT_URL = N'TCP://sqlvm-ha2.sqlha.local:5022',
        FAILOVER_MODE = AUTOMATIC,
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT);

-- Join secondary (run on secondary)
ALTER AVAILABILITY GROUP [sqlha-ag] JOIN;
ALTER DATABASE [TestHADB] SET HADR AVAILABILITY GROUP = [sqlha-ag];

Verification

# Check AG health
SELECT
    ag.name AS ag_name,
    ars.role_desc,
    ars.synchronization_health_desc,
    ar.replica_server_name
FROM sys.dm_hadr_availability_replica_states ars
JOIN sys.availability_replicas ar ON ars.replica_id = ar.replica_id
JOIN sys.availability_groups ag ON ar.group_id = ag.group_id;

Expected:

ag_name	role_desc	synchronization_health	replica_server
sqlha-ag	PRIMARY	HEALTHY	sqlvm-ha1
sqlha-ag	SECONDARY	HEALTHY	sqlvm-ha2

Common Failures

Symptom	Likely Cause	Fix
`New-Cluster` fails: "node not reachable"	NSG blocks cluster ports (3343, 5022, 59999)	Open intra-subnet RDP/cluster ports between VMs
Listener IP unreachable from app	ILB health probe (TCP 59999) not responding on primary	Verify probe rule on SQL with `EnableAlwaysOn` + correct floating IP
AG stuck in RESOLVING after failover	Quorum lost; only 1 node sees cluster	Force quorum on surviving node, then re-add others
`ALTER AVAILABILITY GROUP ... JOIN` fails on secondary	Backup/restore chain broken or LSN mismatch	Re-seed: full backup + log backup + restore WITH NORECOVERY
Failover takes >30 sec	Sync replica is unhealthy or async-only	Confirm sync replica is `SYNCHRONIZED` before relying on auto-failover

⚠️ Watch Out

SQL Server service account must be domain account, not LocalSystem. AG uses Kerberos for inter-node auth. LocalSystem cannot register SPNs against the cluster name.

Variations to Try

Add a 3rd async replica in another region → see Cross-Region AlwaysOn
Switch to distributed AG (forwarder pattern) for cross-AG replication
Replace ILB with DNN listener (SQL 2019+) — no LB needed
Add read-only routing to offload reports to the secondary
Configure backups on secondary (BACKUP_PRIORITY) to offload primary IO

Production Application

Tier-1 OLTP, single region, RPO 0 / RTO < 30 sec → this exact pattern (2 sync replicas, 1 file-share or cloud witness, ILB listener, automatic failover).
Mission-critical SQL on IaaS where PaaS isn't an option (CLR, cross-DB transactions, agent jobs) — AG is the IaaS HA story.
Compliance requires self-managed encryption keys + HA — AG + EKM provider.
Anti-pattern: don't use this for DR-only across regions — add an async replica or use a separate FOG/distributed AG instead.

Clean Up

az group delete --name rg-sql-ha --yes --no-wait

🏢 Real-World DBA Note

Source code & full step-by-step guide: The complete project with Bicep templates, PowerShell scripts, draw.io architecture diagrams, and a Flask failover-testing app is available at github.com/ehabmostafa_microsoft/AlwaysOn-HA-Setup.

Key Takeaways

Why does Always On AG require Active Directory?

Click to reveal answer

WSFC (Windows Server Failover Clustering) requires all nodes to be domain-joined. The cluster uses AD for authentication, computer objects, and the Cluster Name Object (CNO). AG is built on top of WSFC.

1 / 4

Architecture​

What Gets Deployed​

Lab Steps (7 Parts)​

Step 1: Deploy Infrastructure​

Step 2: Install Active Directory​

Step 3: Configure DNS & Domain Join​

Step 4: Install Failover Clustering​

Step 5: Create Cluster​

Step 6: Configure Quorum (File Share Witness)​

Step 7: Enable Always On & Create AG​

Verification​

Common Failures​

Variations to Try​

Production Application​

Clean Up​

Key Takeaways​