Skip to main content

Log Shipping, Failover Cluster Instances & Quorum

HADR Options β€” Complete Reference​

HADR Options Overview
πŸ“¦
Log Shipping
Backup→Copy→Restore cycle. Simple, Standard Edition. Manual failover, 15 min RPO.
πŸ”„
Always On AG
DB-level HA. Sync/async replicas, auto failover, readable secondaries. Enterprise on-prem.
🏒
FCI
Instance-level HA. Shared storage, system DBs included. Requires WSFC + shared disks.
🌍
Failover Groups
Azure PaaS DR. Auto failover, single endpoint, cross-region. For SQL DB and MI.

Log Shipping (SQL VM Only)​

The simplest DR solution β€” automated backup β†’ copy β†’ restore cycle.

Architecture​

Log Shipping β€” 3-Job Cycle
πŸ’Ύ
Backup Job (Primary)
Backs up transaction log to file share
Every 15 minutes (default)
πŸ“€
Copy Job (Secondary)
Copies .trn files from share to local
Runs on secondary server
πŸ”„
Restore Job (Secondary)
Restores .trn files in order
NORECOVERY (not readable) or STANDBY (read-only between restores)

Three Jobs of Log Shipping​

JobRuns OnActionDefault Interval
BackupPrimaryBacks up transaction log β†’ file share15 minutes
CopySecondaryCopies .trn files from share to local15 minutes
RestoreSecondaryRestores .trn files on secondary DB15 minutes

Restore Modes​

ModeSecondary Readable?Behavior
NORECOVERY❌DB in Restoring state, cannot be queried
STANDBYβœ… (read-only)Users disconnected during each restore cycle, then read access resumes
🎯 Exam Focus

Log Shipping key facts: 1) NOT automatic failover β€” manual role change required. 2) RPO = backup interval (typically 15 min). 3) RTO = manual (could be hours). 4) Simpler than AG but less capable. 5) Works on Standard Edition (AG requires Enterprise on-prem).

🏒 Real-World DBA Note

Oracle DBA parallel: Log Shipping = Oracle Data Guard in Maximum Performance mode with manual archive log shipping. The backup-copy-restore cycle is like manually shipping archive logs to a standby. Always On AG is far superior (like Data Guard in Maximum Availability mode with automatic switchover).

Log Shipping vs Always On AG​

FeatureLog ShippingAlways On AG
Auto failover❌ Manualβœ… Automatic
Data loss (RPO)Minutes (backup interval)0 (sync) or seconds (async)
Read secondaryβœ… (STANDBY mode, interrupted)βœ… (always readable)
MonitoringAlert job + history tablesAG dashboard + DMVs
SQL EditionStandard + EnterpriseEnterprise (on-prem), any (Azure)
Failover timeManual β€” hoursSeconds (automatic)
Max secondariesUnlimited9
ConfigurationSimpleComplex (WSFC + endpoints)

Failover Cluster Instance (FCI) on Azure VMs​

Provides instance-level HA (whole SQL Server instance fails over), unlike AG which is database-level.

Architecture​

Failover Cluster Instance (FCI) β€” Shared Storage
πŸ–₯️
Active Node
Runs SQL Server. Owns the shared storage and cluster IP. Only one node active at a time.
πŸ’Ύ
Shared Storage
Azure Shared Disks, Storage Spaces Direct, or Premium File Share. Both nodes access same data.
πŸ’»
Passive Node
Standby. On failover, starts SQL Server and attaches shared storage. System DBs transfer automatically.

FCI vs AG β€” Key Differences​

AspectFCIAG
ScopeEntire SQL instancePer-database
Shared storageRequiredNot needed (log shipping)
Both nodes run SQL?❌ (passive is standby)βœ… (secondary is readable)
System databasesβœ… Shared (master, msdb)❌ Not replicated
SQL Agent jobsβœ… Shared (in msdb)❌ Must sync manually
Instance-level objectsβœ… (logins, linked servers)❌ Must recreate
Read-only secondaryβŒβœ…
Cross-region DRComplexβœ… (async replica)

Shared Storage Options on Azure​

OptionPerformanceComplexityCost
Azure Shared DisksHigh (Premium SSD/Ultra)LowMedium
Storage Spaces Direct (S2D)Very HighHighHigh
Premium File Share (SMB 3.0)MediumLowLow
🎯 Exam Focus

When to use FCI over AG:

  1. You need instance-level failover (logins, jobs, linked servers all fail over together)
  2. You have Standard Edition (AG Basic only supports 1 DB on Standard)
  3. You need system database failover When to use AG: everything else (it's more flexible and doesn't need shared storage)

Quorum β€” Deep Dive​

Quorum prevents split-brain β€” ensures only one partition of the cluster keeps running.

How Quorum Voting Works​

Quorum Voting
πŸ—³οΈ
Majority Rule
N/2 + 1 votes needed. 2 nodes + witness = 3 votes β†’ survives 1 failure (2/3 majority).
☁️
Cloud Witness
Recommended for Azure. Blob in storage account. Region-independent tiebreaker. No extra VM.
⚠️
2 Nodes, No Witness
CANNOT survive any failure! Neither node has majority (1/2). Always add a witness.

Quorum Rule​

Majority (N/2 + 1) of total votes must agree for the cluster to stay online.

Cluster ConfigTotal VotesNode FailsRemaining VotesSurvives?
2 nodes, no witness21 node1/2❌ No majority
2 nodes + Cloud Witness31 node2/3βœ… Majority
3 nodes, no witness31 node2/3βœ… Majority
3 nodes + Cloud Witness41 node3/4βœ… Majority
3 nodes + Cloud Witness42 nodes2/4βœ… Majority
4 nodes + Cloud Witness52 nodes3/5βœ… Majority

Witness Types​

WitnessWhere StoredWhen to Use
Cloud Witness βœ…Azure Storage AccountRecommended for Azure VMs β€” region-independent, no extra VM
File Share WitnessSMB file shareHybrid setups with an on-prem file server
Disk WitnessShared cluster diskTraditional on-prem with shared SAN
🎯 Exam Focus

Critical quorum facts for DP-300:

  1. Cloud Witness is recommended for Azure clusters β€” it's a blob in a storage account
  2. 2-node cluster WITHOUT witness = CANNOT survive any failure (neither node has majority alone)
  3. Even number of nodes β†’ always add a witness (tiebreaker)
  4. Witness should be in a different region from the cluster nodes for maximum resilience

Monitoring HA/DR Solutions​

What to MonitorHowAlert Threshold
AG synchronization statesys.dm_hadr_database_replica_statesNOT SYNCHRONIZED
AG replica rolesys.dm_hadr_availability_replica_statesUnexpected role change
Log send/redo queuesys.dm_hadr_database_replica_states> acceptable RPO
Failover group lagAzure Monitor metricsReplication lag > threshold
Log shipping statusmsdb..log_shipping_monitor_* tablesBackup/restore older than threshold
Cluster healthFailover Cluster Manager eventsNode offline

Flashcards​

What are the 3 jobs in Log Shipping?
Click to reveal answer
1) Backup Job (on primary β€” backs up transaction log). 2) Copy Job (on secondary β€” copies .trn files). 3) Restore Job (on secondary β€” restores .trn files). Default interval: 15 minutes each.
1 / 8

Quiz​

Q1/4
0 correct
A company uses SQL Server Standard Edition and needs a simple DR solution with up to 15-minute RPO. Which solution fits?