Log Shipping, Failover Cluster Instances & Quorum
HADR Options β Complete Referenceβ
Log Shipping (SQL VM Only)β
The simplest DR solution β automated backup β copy β restore cycle.
Architectureβ
Three Jobs of Log Shippingβ
| Job | Runs On | Action | Default Interval |
|---|---|---|---|
| Backup | Primary | Backs up transaction log β file share | 15 minutes |
| Copy | Secondary | Copies .trn files from share to local | 15 minutes |
| Restore | Secondary | Restores .trn files on secondary DB | 15 minutes |
Restore Modesβ
| Mode | Secondary Readable? | Behavior |
|---|---|---|
| NORECOVERY | β | DB in Restoring state, cannot be queried |
| STANDBY | β (read-only) | Users disconnected during each restore cycle, then read access resumes |
Log Shipping key facts: 1) NOT automatic failover β manual role change required. 2) RPO = backup interval (typically 15 min). 3) RTO = manual (could be hours). 4) Simpler than AG but less capable. 5) Works on Standard Edition (AG requires Enterprise on-prem).
Oracle DBA parallel: Log Shipping = Oracle Data Guard in Maximum Performance mode with manual archive log shipping. The backup-copy-restore cycle is like manually shipping archive logs to a standby. Always On AG is far superior (like Data Guard in Maximum Availability mode with automatic switchover).
Log Shipping vs Always On AGβ
| Feature | Log Shipping | Always On AG |
|---|---|---|
| Auto failover | β Manual | β Automatic |
| Data loss (RPO) | Minutes (backup interval) | 0 (sync) or seconds (async) |
| Read secondary | β (STANDBY mode, interrupted) | β (always readable) |
| Monitoring | Alert job + history tables | AG dashboard + DMVs |
| SQL Edition | Standard + Enterprise | Enterprise (on-prem), any (Azure) |
| Failover time | Manual β hours | Seconds (automatic) |
| Max secondaries | Unlimited | 9 |
| Configuration | Simple | Complex (WSFC + endpoints) |
Failover Cluster Instance (FCI) on Azure VMsβ
Provides instance-level HA (whole SQL Server instance fails over), unlike AG which is database-level.
Architectureβ
FCI vs AG β Key Differencesβ
| Aspect | FCI | AG |
|---|---|---|
| Scope | Entire SQL instance | Per-database |
| Shared storage | Required | Not needed (log shipping) |
| Both nodes run SQL? | β (passive is standby) | β (secondary is readable) |
| System databases | β Shared (master, msdb) | β Not replicated |
| SQL Agent jobs | β Shared (in msdb) | β Must sync manually |
| Instance-level objects | β (logins, linked servers) | β Must recreate |
| Read-only secondary | β | β |
| Cross-region DR | Complex | β (async replica) |
Shared Storage Options on Azureβ
| Option | Performance | Complexity | Cost |
|---|---|---|---|
| Azure Shared Disks | High (Premium SSD/Ultra) | Low | Medium |
| Storage Spaces Direct (S2D) | Very High | High | High |
| Premium File Share (SMB 3.0) | Medium | Low | Low |
When to use FCI over AG:
- You need instance-level failover (logins, jobs, linked servers all fail over together)
- You have Standard Edition (AG Basic only supports 1 DB on Standard)
- You need system database failover When to use AG: everything else (it's more flexible and doesn't need shared storage)
Quorum β Deep Diveβ
Quorum prevents split-brain β ensures only one partition of the cluster keeps running.
How Quorum Voting Worksβ
Quorum Ruleβ
Majority (N/2 + 1) of total votes must agree for the cluster to stay online.
| Cluster Config | Total Votes | Node Fails | Remaining Votes | Survives? |
|---|---|---|---|---|
| 2 nodes, no witness | 2 | 1 node | 1/2 | β No majority |
| 2 nodes + Cloud Witness | 3 | 1 node | 2/3 | β Majority |
| 3 nodes, no witness | 3 | 1 node | 2/3 | β Majority |
| 3 nodes + Cloud Witness | 4 | 1 node | 3/4 | β Majority |
| 3 nodes + Cloud Witness | 4 | 2 nodes | 2/4 | β Majority |
| 4 nodes + Cloud Witness | 5 | 2 nodes | 3/5 | β Majority |
Witness Typesβ
| Witness | Where Stored | When to Use |
|---|---|---|
| Cloud Witness β | Azure Storage Account | Recommended for Azure VMs β region-independent, no extra VM |
| File Share Witness | SMB file share | Hybrid setups with an on-prem file server |
| Disk Witness | Shared cluster disk | Traditional on-prem with shared SAN |
Critical quorum facts for DP-300:
- Cloud Witness is recommended for Azure clusters β it's a blob in a storage account
- 2-node cluster WITHOUT witness = CANNOT survive any failure (neither node has majority alone)
- Even number of nodes β always add a witness (tiebreaker)
- Witness should be in a different region from the cluster nodes for maximum resilience
Monitoring HA/DR Solutionsβ
| What to Monitor | How | Alert Threshold |
|---|---|---|
| AG synchronization state | sys.dm_hadr_database_replica_states | NOT SYNCHRONIZED |
| AG replica role | sys.dm_hadr_availability_replica_states | Unexpected role change |
| Log send/redo queue | sys.dm_hadr_database_replica_states | > acceptable RPO |
| Failover group lag | Azure Monitor metrics | Replication lag > threshold |
| Log shipping status | msdb..log_shipping_monitor_* tables | Backup/restore older than threshold |
| Cluster health | Failover Cluster Manager events | Node offline |