Log Shipping, Failover Cluster Instances & Quorum

HADR Options — Complete Reference

HADR Options Overview

📦

Log Shipping

Backup→Copy→Restore cycle. Simple, Standard Edition. Manual failover, 15 min RPO.

🔄

Always On AG

DB-level HA. Sync/async replicas, auto failover, readable secondaries. Enterprise on-prem.

🏢

FCI

Instance-level HA. Shared storage, system DBs included. Requires WSFC + shared disks.

🌍

Failover Groups

Azure PaaS DR. Auto failover, single endpoint, cross-region. For SQL DB and MI.

Log Shipping (SQL VM Only)

The simplest DR solution — automated backup → copy → restore cycle.

Architecture

Log Shipping — 3-Job Cycle

💾

Backup Job (Primary)

Backs up transaction log to file share

Every 15 minutes (default)

📤

Copy Job (Secondary)

Copies .trn files from share to local

Runs on secondary server

🔄

Restore Job (Secondary)

Restores .trn files in order

NORECOVERY (not readable) or STANDBY (read-only between restores)

Three Jobs of Log Shipping

Job	Runs On	Action	Default Interval
Backup	Primary	Backs up transaction log → file share	15 minutes
Copy	Secondary	Copies .trn files from share to local	15 minutes
Restore	Secondary	Restores .trn files on secondary DB	15 minutes

Restore Modes

Mode	Secondary Readable?	Behavior
NORECOVERY	❌	DB in Restoring state, cannot be queried
STANDBY	✅ (read-only)	Users disconnected during each restore cycle, then read access resumes

🎯 Exam Focus

Log Shipping key facts: 1) NOT automatic failover — manual role change required. 2) RPO = backup interval (typically 15 min). 3) RTO = manual (could be hours). 4) Simpler than AG but less capable. 5) Works on Standard Edition (AG requires Enterprise on-prem).

🏢 Real-World DBA Note

Oracle DBA parallel: Log Shipping = Oracle Data Guard in Maximum Performance mode with manual archive log shipping. The backup-copy-restore cycle is like manually shipping archive logs to a standby. Always On AG is far superior (like Data Guard in Maximum Availability mode with automatic switchover).

Log Shipping setup — enable order

Configure Log Shipping

Set DB to FULL recovery

ALTER DATABASE ... SET RECOVERY FULL

Take a full backup first — chain starts here

Provision shared file share

Both SQL service accounts need read/write

On Azure: Premium File Share or shared VM SMB

Configure on primary

Enable log shipping in DB Properties

Set backup job interval (default 15 min)

Configure on secondary

Initialize secondary from full backup

Pick NORECOVERY or STANDBY mode

Define copy + restore jobs

Optional: monitor server

Central log_shipping_monitor instance

Alert on backup/copy/restore lag

Common ordering trap

Do not initialize the secondary from a backup taken before you set RECOVERY = FULL. The log chain breaks and the restore job fails on the first .trn. Always: SET FULL → take fresh full backup → restore to secondary WITH NORECOVERY → then enable log shipping jobs.

Log Shipping vs Always On AG

Feature	Log Shipping	Always On AG
Auto failover	❌ Manual	✅ Automatic
Data loss (RPO)	Minutes (backup interval)	0 (sync) or seconds (async)
Read secondary	✅ (STANDBY mode, interrupted)	✅ (always readable)
Monitoring	Alert job + history tables	AG dashboard + DMVs
SQL Edition	Standard + Enterprise	Enterprise (on-prem), any (Azure)
Failover time	Manual — hours	Seconds (automatic)
Max secondaries	Unlimited	9
Configuration	Simple	Complex (WSFC + endpoints)

Failover Cluster Instance (FCI) on Azure VMs

Provides instance-level HA (whole SQL Server instance fails over), unlike AG which is database-level.

Architecture

Failover Cluster Instance (FCI) — Shared Storage

🖥️

Active Node

Runs SQL Server. Owns the shared storage and cluster IP. Only one node active at a time.

💾

Shared Storage

Azure Shared Disks, Storage Spaces Direct, or Premium File Share. Both nodes access same data.

💻

Passive Node

Standby. On failover, starts SQL Server and attaches shared storage. System DBs transfer automatically.

FCI vs AG — Key Differences

Aspect	FCI	AG
Scope	Entire SQL instance	Per-database
Shared storage	Required	Not needed (log shipping)
Both nodes run SQL?	❌ (passive is standby)	✅ (secondary is readable)
System databases	✅ Shared (master, msdb)	❌ Not replicated
SQL Agent jobs	✅ Shared (in msdb)	❌ Must sync manually
Instance-level objects	✅ (logins, linked servers)	❌ Must recreate
Read-only secondary	❌	✅
Cross-region DR	Complex	✅ (async replica)

Shared Storage Options on Azure

Option	Performance	Complexity	Cost
Azure Shared Disks	High (Premium SSD/Ultra)	Low	Medium
Storage Spaces Direct (S2D)	Very High	High	High
Premium File Share (SMB 3.0)	Medium	Low	Low

🎯 Exam Focus

When to use FCI over AG:

You need instance-level failover (logins, jobs, linked servers all fail over together)
You have Standard Edition (AG Basic only supports 1 DB on Standard)
You need system database failover When to use AG: everything else (it's more flexible and doesn't need shared storage)

FCI on Azure VMs — setup order

Build a SQL FCI on Azure

AD + WSFC foundation

Domain-joined VMs in same AZ-aware placement

Install Failover Clustering, create cluster -NoStorage

Add Cloud Witness

Storage account in a different region

Required for 2-node clusters

Provision shared storage

Azure Shared Disks (simplest), S2D, or Premium File Share

Add as Cluster Shared Volume / Available Storage

Install SQL as cluster role

Setup.exe → "New SQL Server failover cluster installation" on node 1

"Add node" on node 2 (uses same cluster name)

Configure ILB for VNN

ILB frontend = SQL VNN IP, floating IP enabled

Probe port 59999 like AG Listener

Common ordering trap

Running standalone SQL setup first and then trying to "convert" to FCI is not supported. The first SQL install must be the clustered install path. Standalone-then-cluster forces a full uninstall/reinstall.

Quorum — Deep Dive

Quorum prevents split-brain — ensures only one partition of the cluster keeps running.

How Quorum Voting Works

Quorum Voting

🗳️

Majority Rule

N/2 + 1 votes needed. 2 nodes + witness = 3 votes → survives 1 failure (2/3 majority).

☁️

Cloud Witness

Recommended for Azure. Blob in storage account. Region-independent tiebreaker. No extra VM.

⚠️

2 Nodes, No Witness

CANNOT survive any failure! Neither node has majority (1/2). Always add a witness.

Quorum Rule

Majority (N/2 + 1) of total votes must agree for the cluster to stay online.

Cluster Config	Total Votes	Node Fails	Remaining Votes	Survives?
2 nodes, no witness	2	1 node	1/2	❌ No majority
2 nodes + Cloud Witness	3	1 node	2/3	✅ Majority
3 nodes, no witness	3	1 node	2/3	✅ Majority
3 nodes + Cloud Witness	4	1 node	3/4	✅ Majority
3 nodes + Cloud Witness	4	2 nodes	2/4	✅ Majority
4 nodes + Cloud Witness	5	2 nodes	3/5	✅ Majority

Witness Types

Witness	Where Stored	When to Use
Cloud Witness ✅	Azure Storage Account	Recommended for Azure VMs — region-independent, no extra VM
File Share Witness	SMB file share	Hybrid setups with an on-prem file server
Disk Witness	Shared cluster disk	Traditional on-prem with shared SAN

🎯 Exam Focus

Critical quorum facts for DP-300:

Cloud Witness is recommended for Azure clusters — it's a blob in a storage account
2-node cluster WITHOUT witness = CANNOT survive any failure (neither node has majority alone)
Even number of nodes → always add a witness (tiebreaker)
Witness should be in a different region from the cluster nodes for maximum resilience

Monitoring HA/DR Solutions

What to Monitor	How	Alert Threshold
AG synchronization state	`sys.dm_hadr_database_replica_states`	`NOT SYNCHRONIZED`
AG replica role	`sys.dm_hadr_availability_replica_states`	Unexpected role change
Log send/redo queue	`sys.dm_hadr_database_replica_states`	> acceptable RPO
Failover group lag	Azure Monitor metrics	Replication lag > threshold
Log shipping status	`msdb..log_shipping_monitor_*` tables	Backup/restore older than threshold
Cluster health	Failover Cluster Manager events	Node offline

Anti-Patterns

"Log shipping = HA." Log shipping is DR / read-only reporting, not HA — manual failover, RPO measured in restore-job intervals (typically 15 min). For HA use AG or FCI.
"FCI shared disk on Azure VMs = use Premium SSD with each VM." FCI requires shared storage — on Azure that's Azure Shared Disks (ZRS) or Storage Spaces Direct (S2D). Single-attach Premium SSD does NOT work for FCI.
"3 nodes in cluster = quorum is fine." With 3 nodes and No Majority quorum mode, losing 2 nodes loses quorum. Use Node Majority + Cloud Witness for cross-region clusters or odd counts.
"Cloud Witness in same region as cluster." Defeats the purpose for region-failure scenarios. Place Cloud Witness in a separate region for DR-aware quorum.
"Log Shipping STANDBY mode — users can read forever." STANDBY disconnects users during each restore cycle. For 24/7 reporting use AG read-only routing instead.
"AG without listener — apps connect to current primary directly." That breaks on failover. Always configure listener + use it in connection strings.

⚠️ Watch Out

FCI on Azure VMs requires Standard Load Balancer with Floating IP enabled and a probe on TCP/SQL port. Without the LB + Floating IP, the FCI virtual name is not reachable from clients. Same rule as AG listener.

Migration Between HA/DR Topologies

From → To	Path	Cost
Standalone SQL VM → Log Shipping (DR)	Configure backup/copy/restore jobs to secondary VM	Easy; manual failover; periodic data loss
Log Shipping → Always On AG	Build cluster + AG; remove LS jobs	Cuts RPO; gain auto-failover; complexity
AG (sync, 1 secondary) → AG (sync + async secondary)	Add async secondary in DR region	DR coverage; secondary VM cost
AG → FCI	FCI only useful when shared storage required (legacy)	Rare in cloud; requires shared disk
Node Majority → Node + Cloud Witness	Reconfigure quorum via FCM	Removes file-share witness dependency
File-share witness → Cloud Witness	Same as above	Tiny storage cost; cross-region resilient
AG sync mode → distributed AG	Forwarder pattern across regions	Cross-region; very complex topology
FCI on iSCSI → FCI on Azure Shared Disks ZRS	Reconfigure storage	Native Azure resilience

Most expensive moves: FCI on cloud (often the wrong tool) and distributed AG (operational complexity).

Real Scenarios

DR-only secondary, no auto-failover required → Log shipping with 15-min copy/restore, STANDBY mode for read reports. Driver: cheap, simple. Trade-off: RPO = 15 min, manual failover.
2-node AG sync + 1-node async DR → Cross-region async secondary in addition to local sync replica. Driver: HA + DR. Trade-off: 3 VMs licensed for SQL.
Legacy ERP requiring FCI → 2-node FCI on Azure Shared Disks ZRS, Standard LB + Floating IP, Cloud Witness in a 3rd region. Driver: app vendor mandates FCI. Trade-off: shared disk SKU + LB cost.
24/7 reporting workload → AG with read-only routing to async secondary. Driver: offload reads from primary. Trade-off: read-only secondary slightly behind.
Cross-region quorum without an extra VM → Node Majority + Cloud Witness in a third region. Driver: avoid 3rd VM cost. Trade-off: storage account dependency for quorum.

Flashcards

What are the 3 jobs in Log Shipping?

Click to reveal answer

1) Backup Job (on primary — backs up transaction log). 2) Copy Job (on secondary — copies .trn files). 3) Restore Job (on secondary — restores .trn files). Default interval: 15 minutes each.

1 / 8

Quiz

Q1/4

0 correct

A company uses SQL Server Standard Edition and needs a simple DR solution with up to 15-minute RPO. Which solution fits?

HADR Options — Complete Reference​

Log Shipping (SQL VM Only)​

Architecture​

Three Jobs of Log Shipping​

Restore Modes​

Log Shipping setup — enable order​

Log Shipping vs Always On AG​

Failover Cluster Instance (FCI) on Azure VMs​

Architecture​

FCI vs AG — Key Differences​

Shared Storage Options on Azure​

FCI on Azure VMs — setup order​

Quorum — Deep Dive​

How Quorum Voting Works​

Quorum Rule​

Witness Types​

Monitoring HA/DR Solutions​

Anti-Patterns​

Migration Between HA/DR Topologies​

Real Scenarios​

Flashcards​

Quiz​

HADR Options — Complete Reference

Log Shipping (SQL VM Only)

Architecture

Three Jobs of Log Shipping

Restore Modes

Log Shipping setup — enable order

Log Shipping vs Always On AG

Failover Cluster Instance (FCI) on Azure VMs

Architecture

FCI vs AG — Key Differences

Shared Storage Options on Azure

FCI on Azure VMs — setup order

Quorum — Deep Dive

How Quorum Voting Works

Quorum Rule

Witness Types

Monitoring HA/DR Solutions

Anti-Patterns

Migration Between HA/DR Topologies

Real Scenarios

Flashcards

Quiz