Skip to main content

Lab: SQL Server Always On Availability Groups on Azure VMs

Complete, production-ready guide — deploy a 2-node SQL Server 2022 Always On AG on Azure VMs from scratch, with automatic failover, zero data loss, Internal Load Balancer, and a real-time failover testing web app.

DP-300 Domain: 5 — Plan and Configure HA/DR (20-25%)

Architecture

AlwaysOn HA setup lab: 2-node SQL 2022 AG with WSFC, ILB-fronted listener, dc-vm hosting AD DS + DNS + File Share Witness + Flask test app

What Gets Deployed

ComponentResourcePurpose
VNETvnet-sql-ha (10.0.0.0/16)Network isolation
NSGnsg-sql-haFirewall rules for RDP, SQL, HADR
Domain Controllerdc-vm (10.0.1.4)AD DS + DNS + File Share Witness
SQL VM 1sqlvm-ha1 (10.0.1.10)SQL Server 2022 Enterprise replica
SQL VM 2sqlvm-ha2 (10.0.1.11)SQL Server 2022 Enterprise replica
Load Balancerilb-sqlha (10.0.1.30)Routes SQL traffic to AG primary
Flask Appdc-vm:8080Real-time failover testing dashboard

Lab Steps (7 Parts)

Deploy Infrastructure
Bicep template deploys VNET, NSG, 3 VMs (1 DC + 2 SQL) with static IPs
Install Active Directory
Promote dc-vm to Domain Controller, configure DNS, join SQL VMs to domain
Configure DNS & Domain Join
Point VNET DNS to DC, domain-join both SQL VMs, configure service accounts
Install Failover Clustering
Install WSFC feature on both SQL nodes, validate cluster prerequisites
Create Cluster & Quorum
Create WSFC cluster with static IP, configure File Share Witness for quorum
Configure Quorum
Set quorum mode, verify voting, test witness accessibility
Enable Always On & Create AG
Enable AG feature, create AG with sync commit + automatic failover, configure ILB listener

Step 1: Deploy Infrastructure

# Variables
$resourceGroup = "rg-sql-ha"
$location = "westeurope"

# Create resource group
az group create --name $resourceGroup --location $location

# Deploy infrastructure (2 SQL VMs + networking)
az deployment group create `
--resource-group $resourceGroup `
--template-file sql-ha-vms.bicep `
--parameters location=$location adminUsername=azureadmin adminPassword='<YOUR_PASSWORD>'

# Deploy Domain Controller VM
az vm create `
--resource-group $resourceGroup `
--name dc-vm `
--image Win2022Datacenter `
--size Standard_D2ads_v6 `
--admin-username azureadmin `
--admin-password '<YOUR_PASSWORD>' `
--vnet-name vnet-sql-ha `
--subnet sql-subnet `
--private-ip-address 10.0.1.4 `
--public-ip-sku Standard `
--nsg nsg-sql-ha

Step 2: Install Active Directory

RDP into dc-vm and run:

# Install AD DS
Install-WindowsFeature AD-Domain-Services -IncludeManagementTools

# Promote to Domain Controller
Install-ADDSForest `
-DomainName "sqlha.local" `
-DomainNetBIOSName "SQLHA" `
-InstallDns `
-SafeModeAdministratorPassword (ConvertTo-SecureString "<DSRM_PASSWORD>" -AsPlainText -Force) `
-Force

Step 3: Configure DNS & Domain Join

# Update VNET DNS to point to DC
az network vnet update `
--resource-group rg-sql-ha `
--name vnet-sql-ha `
--dns-servers 10.0.1.4

# Restart SQL VMs to pick up new DNS
az vm restart --resource-group rg-sql-ha --name sqlvm-ha1
az vm restart --resource-group rg-sql-ha --name sqlvm-ha2

On each SQL VM:

# Join domain
Add-Computer -DomainName "sqlha.local" -Credential (Get-Credential SQLHA\azureadmin) -Restart

Step 4: Install Failover Clustering

On both SQL VMs:

Install-WindowsFeature Failover-Clustering -IncludeManagementTools

Step 5: Create Cluster

From either SQL VM:

# Validate cluster prerequisites
Test-Cluster -Node sqlvm-ha1,sqlvm-ha2

# Create cluster (no shared storage in Azure)
New-Cluster -Name sqlha-cluster `
-Node sqlvm-ha1,sqlvm-ha2 `
-StaticAddress 10.0.1.20 `
-NoStorage
⚠️ Watch Out

Always use -NoStorage. Azure VMs don't share storage for traditional FCI. Always On AG uses database-level replication instead of shared disks.

Step 6: Configure Quorum (File Share Witness)

On dc-vm:

# Create witness share
New-Item -Path C:\ClusterWitness -ItemType Directory
New-SmbShare -Name ClusterWitness -Path C:\ClusterWitness -FullAccess "SQLHA\sqlha-cluster$"

From a SQL VM:

Set-ClusterQuorum -FileShareWitness "\\dc-vm\ClusterWitness"

Step 7: Enable Always On & Create AG

# Enable AG on both SQL instances
Enable-SqlAlwaysOn -ServerInstance "sqlvm-ha1" -Force
Enable-SqlAlwaysOn -ServerInstance "sqlvm-ha2" -Force

# Create endpoints, AG, and listener via T-SQL (run on primary)
-- Create database
CREATE DATABASE TestHADB;
ALTER DATABASE TestHADB SET RECOVERY FULL;
BACKUP DATABASE TestHADB TO DISK = 'NUL';

-- Create AG endpoint on both nodes (run on each)
CREATE ENDPOINT [Hadr_endpoint]
STATE = STARTED
AS TCP (LISTENER_PORT = 5022)
FOR DATABASE_MIRRORING (ROLE = ALL);

-- Create Availability Group (run on primary)
CREATE AVAILABILITY GROUP [sqlha-ag]
WITH (AUTOMATED_BACKUP_PREFERENCE = PRIMARY)
FOR DATABASE [TestHADB]
REPLICA ON
N'sqlvm-ha1' WITH (
ENDPOINT_URL = N'TCP://sqlvm-ha1.sqlha.local:5022',
FAILOVER_MODE = AUTOMATIC,
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT),
N'sqlvm-ha2' WITH (
ENDPOINT_URL = N'TCP://sqlvm-ha2.sqlha.local:5022',
FAILOVER_MODE = AUTOMATIC,
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT);

-- Join secondary (run on secondary)
ALTER AVAILABILITY GROUP [sqlha-ag] JOIN;
ALTER DATABASE [TestHADB] SET HADR AVAILABILITY GROUP = [sqlha-ag];

Verification

# Check AG health
SELECT
ag.name AS ag_name,
ars.role_desc,
ars.synchronization_health_desc,
ar.replica_server_name
FROM sys.dm_hadr_availability_replica_states ars
JOIN sys.availability_replicas ar ON ars.replica_id = ar.replica_id
JOIN sys.availability_groups ag ON ar.group_id = ag.group_id;

Expected:

ag_namerole_descsynchronization_healthreplica_server
sqlha-agPRIMARYHEALTHYsqlvm-ha1
sqlha-agSECONDARYHEALTHYsqlvm-ha2

Common Failures

SymptomLikely CauseFix
New-Cluster fails: "node not reachable"NSG blocks cluster ports (3343, 5022, 59999)Open intra-subnet RDP/cluster ports between VMs
Listener IP unreachable from appILB health probe (TCP 59999) not responding on primaryVerify probe rule on SQL with EnableAlwaysOn + correct floating IP
AG stuck in RESOLVING after failoverQuorum lost; only 1 node sees clusterForce quorum on surviving node, then re-add others
ALTER AVAILABILITY GROUP ... JOIN fails on secondaryBackup/restore chain broken or LSN mismatchRe-seed: full backup + log backup + restore WITH NORECOVERY
Failover takes >30 secSync replica is unhealthy or async-onlyConfirm sync replica is SYNCHRONIZED before relying on auto-failover
⚠️ Watch Out

SQL Server service account must be domain account, not LocalSystem. AG uses Kerberos for inter-node auth. LocalSystem cannot register SPNs against the cluster name.

Variations to Try

  • Add a 3rd async replica in another region → see Cross-Region AlwaysOn
  • Switch to distributed AG (forwarder pattern) for cross-AG replication
  • Replace ILB with DNN listener (SQL 2019+) — no LB needed
  • Add read-only routing to offload reports to the secondary
  • Configure backups on secondary (BACKUP_PRIORITY) to offload primary IO

Production Application

  • Tier-1 OLTP, single region, RPO 0 / RTO < 30 sec → this exact pattern (2 sync replicas, 1 file-share or cloud witness, ILB listener, automatic failover).
  • Mission-critical SQL on IaaS where PaaS isn't an option (CLR, cross-DB transactions, agent jobs) — AG is the IaaS HA story.
  • Compliance requires self-managed encryption keys + HA — AG + EKM provider.
  • Anti-pattern: don't use this for DR-only across regions — add an async replica or use a separate FOG/distributed AG instead.

Clean Up

az group delete --name rg-sql-ha --yes --no-wait
🏢 Real-World DBA Note

Source code & full step-by-step guide: The complete project with Bicep templates, PowerShell scripts, draw.io architecture diagrams, and a Flask failover-testing app is available at github.com/ehabmostafa_microsoft/AlwaysOn-HA-Setup.

Key Takeaways

Why does Always On AG require Active Directory?
Click to reveal answer
WSFC (Windows Server Failover Clustering) requires all nodes to be domain-joined. The cluster uses AD for authentication, computer objects, and the Cluster Name Object (CNO). AG is built on top of WSFC.
1 / 4