Skip to main content

Log Shipping, Failover Cluster Instances & Quorum

HADR Options β€” Complete Reference​

HADR Options Overview
πŸ“¦
Log Shipping
Backup→Copy→Restore cycle. Simple, Standard Edition. Manual failover, 15 min RPO.
πŸ”„
Always On AG
DB-level HA. Sync/async replicas, auto failover, readable secondaries. Enterprise on-prem.
🏒
FCI
Instance-level HA. Shared storage, system DBs included. Requires WSFC + shared disks.
🌍
Failover Groups
Azure PaaS DR. Auto failover, single endpoint, cross-region. For SQL DB and MI.

Log Shipping (SQL VM Only)​

The simplest DR solution β€” automated backup β†’ copy β†’ restore cycle.

Architecture​

Log Shipping β€” 3-Job Cycle
πŸ’Ύ
Backup Job (Primary)
Backs up transaction log to file share
Every 15 minutes (default)
πŸ“€
Copy Job (Secondary)
Copies .trn files from share to local
Runs on secondary server
πŸ”„
Restore Job (Secondary)
Restores .trn files in order
NORECOVERY (not readable) or STANDBY (read-only between restores)

Three Jobs of Log Shipping​

JobRuns OnActionDefault Interval
BackupPrimaryBacks up transaction log β†’ file share15 minutes
CopySecondaryCopies .trn files from share to local15 minutes
RestoreSecondaryRestores .trn files on secondary DB15 minutes

Restore Modes​

ModeSecondary Readable?Behavior
NORECOVERY❌DB in Restoring state, cannot be queried
STANDBYβœ… (read-only)Users disconnected during each restore cycle, then read access resumes
🎯 Exam Focus

Log Shipping key facts: 1) NOT automatic failover β€” manual role change required. 2) RPO = backup interval (typically 15 min). 3) RTO = manual (could be hours). 4) Simpler than AG but less capable. 5) Works on Standard Edition (AG requires Enterprise on-prem).

🏒 Real-World DBA Note

Oracle DBA parallel: Log Shipping = Oracle Data Guard in Maximum Performance mode with manual archive log shipping. The backup-copy-restore cycle is like manually shipping archive logs to a standby. Always On AG is far superior (like Data Guard in Maximum Availability mode with automatic switchover).

Log Shipping setup β€” enable order​

Configure Log Shipping
1
Set DB to FULL recovery
ALTER DATABASE ... SET RECOVERY FULL
Take a full backup first β€” chain starts here
2
Provision shared file share
Both SQL service accounts need read/write
On Azure: Premium File Share or shared VM SMB
3
Configure on primary
Enable log shipping in DB Properties
Set backup job interval (default 15 min)
4
Configure on secondary
Initialize secondary from full backup
Pick NORECOVERY or STANDBY mode
Define copy + restore jobs
5
Optional: monitor server
Central log_shipping_monitor instance
Alert on backup/copy/restore lag
Common ordering trap

Do not initialize the secondary from a backup taken before you set RECOVERY = FULL. The log chain breaks and the restore job fails on the first .trn. Always: SET FULL β†’ take fresh full backup β†’ restore to secondary WITH NORECOVERY β†’ then enable log shipping jobs.

Log Shipping vs Always On AG​

FeatureLog ShippingAlways On AG
Auto failover❌ Manualβœ… Automatic
Data loss (RPO)Minutes (backup interval)0 (sync) or seconds (async)
Read secondaryβœ… (STANDBY mode, interrupted)βœ… (always readable)
MonitoringAlert job + history tablesAG dashboard + DMVs
SQL EditionStandard + EnterpriseEnterprise (on-prem), any (Azure)
Failover timeManual β€” hoursSeconds (automatic)
Max secondariesUnlimited9
ConfigurationSimpleComplex (WSFC + endpoints)

Failover Cluster Instance (FCI) on Azure VMs​

Provides instance-level HA (whole SQL Server instance fails over), unlike AG which is database-level.

Architecture​

Failover Cluster Instance (FCI) β€” Shared Storage
πŸ–₯️
Active Node
Runs SQL Server. Owns the shared storage and cluster IP. Only one node active at a time.
πŸ’Ύ
Shared Storage
Azure Shared Disks, Storage Spaces Direct, or Premium File Share. Both nodes access same data.
πŸ’»
Passive Node
Standby. On failover, starts SQL Server and attaches shared storage. System DBs transfer automatically.

FCI vs AG β€” Key Differences​

AspectFCIAG
ScopeEntire SQL instancePer-database
Shared storageRequiredNot needed (log shipping)
Both nodes run SQL?❌ (passive is standby)βœ… (secondary is readable)
System databasesβœ… Shared (master, msdb)❌ Not replicated
SQL Agent jobsβœ… Shared (in msdb)❌ Must sync manually
Instance-level objectsβœ… (logins, linked servers)❌ Must recreate
Read-only secondaryβŒβœ…
Cross-region DRComplexβœ… (async replica)

Shared Storage Options on Azure​

OptionPerformanceComplexityCost
Azure Shared DisksHigh (Premium SSD/Ultra)LowMedium
Storage Spaces Direct (S2D)Very HighHighHigh
Premium File Share (SMB 3.0)MediumLowLow
🎯 Exam Focus

When to use FCI over AG:

  1. You need instance-level failover (logins, jobs, linked servers all fail over together)
  2. You have Standard Edition (AG Basic only supports 1 DB on Standard)
  3. You need system database failover When to use AG: everything else (it's more flexible and doesn't need shared storage)

FCI on Azure VMs β€” setup order​

Build a SQL FCI on Azure
1
AD + WSFC foundation
Domain-joined VMs in same AZ-aware placement
Install Failover Clustering, create cluster -NoStorage
2
Add Cloud Witness
Storage account in a different region
Required for 2-node clusters
3
Provision shared storage
Azure Shared Disks (simplest), S2D, or Premium File Share
Add as Cluster Shared Volume / Available Storage
4
Install SQL as cluster role
Setup.exe β†’ "New SQL Server failover cluster installation" on node 1
"Add node" on node 2 (uses same cluster name)
5
Configure ILB for VNN
ILB frontend = SQL VNN IP, floating IP enabled
Probe port 59999 like AG Listener
Common ordering trap

Running standalone SQL setup first and then trying to "convert" to FCI is not supported. The first SQL install must be the clustered install path. Standalone-then-cluster forces a full uninstall/reinstall.

Quorum β€” Deep Dive​

Quorum prevents split-brain β€” ensures only one partition of the cluster keeps running.

How Quorum Voting Works​

Quorum Voting
πŸ—³οΈ
Majority Rule
N/2 + 1 votes needed. 2 nodes + witness = 3 votes β†’ survives 1 failure (2/3 majority).
☁️
Cloud Witness
Recommended for Azure. Blob in storage account. Region-independent tiebreaker. No extra VM.
⚠️
2 Nodes, No Witness
CANNOT survive any failure! Neither node has majority (1/2). Always add a witness.

Quorum Rule​

Majority (N/2 + 1) of total votes must agree for the cluster to stay online.

Cluster ConfigTotal VotesNode FailsRemaining VotesSurvives?
2 nodes, no witness21 node1/2❌ No majority
2 nodes + Cloud Witness31 node2/3βœ… Majority
3 nodes, no witness31 node2/3βœ… Majority
3 nodes + Cloud Witness41 node3/4βœ… Majority
3 nodes + Cloud Witness42 nodes2/4βœ… Majority
4 nodes + Cloud Witness52 nodes3/5βœ… Majority

Witness Types​

WitnessWhere StoredWhen to Use
Cloud Witness βœ…Azure Storage AccountRecommended for Azure VMs β€” region-independent, no extra VM
File Share WitnessSMB file shareHybrid setups with an on-prem file server
Disk WitnessShared cluster diskTraditional on-prem with shared SAN
🎯 Exam Focus

Critical quorum facts for DP-300:

  1. Cloud Witness is recommended for Azure clusters β€” it's a blob in a storage account
  2. 2-node cluster WITHOUT witness = CANNOT survive any failure (neither node has majority alone)
  3. Even number of nodes β†’ always add a witness (tiebreaker)
  4. Witness should be in a different region from the cluster nodes for maximum resilience

Monitoring HA/DR Solutions​

What to MonitorHowAlert Threshold
AG synchronization statesys.dm_hadr_database_replica_statesNOT SYNCHRONIZED
AG replica rolesys.dm_hadr_availability_replica_statesUnexpected role change
Log send/redo queuesys.dm_hadr_database_replica_states> acceptable RPO
Failover group lagAzure Monitor metricsReplication lag > threshold
Log shipping statusmsdb..log_shipping_monitor_* tablesBackup/restore older than threshold
Cluster healthFailover Cluster Manager eventsNode offline

Anti-Patterns​

  • "Log shipping = HA." Log shipping is DR / read-only reporting, not HA β€” manual failover, RPO measured in restore-job intervals (typically 15 min). For HA use AG or FCI.
  • "FCI shared disk on Azure VMs = use Premium SSD with each VM." FCI requires shared storage β€” on Azure that's Azure Shared Disks (ZRS) or Storage Spaces Direct (S2D). Single-attach Premium SSD does NOT work for FCI.
  • "3 nodes in cluster = quorum is fine." With 3 nodes and No Majority quorum mode, losing 2 nodes loses quorum. Use Node Majority + Cloud Witness for cross-region clusters or odd counts.
  • "Cloud Witness in same region as cluster." Defeats the purpose for region-failure scenarios. Place Cloud Witness in a separate region for DR-aware quorum.
  • "Log Shipping STANDBY mode β€” users can read forever." STANDBY disconnects users during each restore cycle. For 24/7 reporting use AG read-only routing instead.
  • "AG without listener β€” apps connect to current primary directly." That breaks on failover. Always configure listener + use it in connection strings.
⚠️ Watch Out

FCI on Azure VMs requires Standard Load Balancer with Floating IP enabled and a probe on TCP/SQL port. Without the LB + Floating IP, the FCI virtual name is not reachable from clients. Same rule as AG listener.


Migration Between HA/DR Topologies​

From β†’ ToPathCost
Standalone SQL VM β†’ Log Shipping (DR)Configure backup/copy/restore jobs to secondary VMEasy; manual failover; periodic data loss
Log Shipping β†’ Always On AGBuild cluster + AG; remove LS jobsCuts RPO; gain auto-failover; complexity
AG (sync, 1 secondary) β†’ AG (sync + async secondary)Add async secondary in DR regionDR coverage; secondary VM cost
AG β†’ FCIFCI only useful when shared storage required (legacy)Rare in cloud; requires shared disk
Node Majority β†’ Node + Cloud WitnessReconfigure quorum via FCMRemoves file-share witness dependency
File-share witness β†’ Cloud WitnessSame as aboveTiny storage cost; cross-region resilient
AG sync mode β†’ distributed AGForwarder pattern across regionsCross-region; very complex topology
FCI on iSCSI β†’ FCI on Azure Shared Disks ZRSReconfigure storageNative Azure resilience

Most expensive moves: FCI on cloud (often the wrong tool) and distributed AG (operational complexity).


Real Scenarios​

  1. DR-only secondary, no auto-failover required β†’ Log shipping with 15-min copy/restore, STANDBY mode for read reports. Driver: cheap, simple. Trade-off: RPO = 15 min, manual failover.
  2. 2-node AG sync + 1-node async DR β†’ Cross-region async secondary in addition to local sync replica. Driver: HA + DR. Trade-off: 3 VMs licensed for SQL.
  3. Legacy ERP requiring FCI β†’ 2-node FCI on Azure Shared Disks ZRS, Standard LB + Floating IP, Cloud Witness in a 3rd region. Driver: app vendor mandates FCI. Trade-off: shared disk SKU + LB cost.
  4. 24/7 reporting workload β†’ AG with read-only routing to async secondary. Driver: offload reads from primary. Trade-off: read-only secondary slightly behind.
  5. Cross-region quorum without an extra VM β†’ Node Majority + Cloud Witness in a third region. Driver: avoid 3rd VM cost. Trade-off: storage account dependency for quorum.

Flashcards​

What are the 3 jobs in Log Shipping?
Click to reveal answer
1) Backup Job (on primary β€” backs up transaction log). 2) Copy Job (on secondary β€” copies .trn files). 3) Restore Job (on secondary β€” restores .trn files). Default interval: 15 minutes each.
1 / 8

Quiz​

Q1/4
0 correct
A company uses SQL Server Standard Edition and needs a simple DR solution with up to 15-minute RPO. Which solution fits?