Skip to main content

SQL Managed Instance — High Availability Architecture

SQL Managed Instance has built-in HA that works differently from SQL Server on VMs. You don't configure WSFC or load balancers — Microsoft handles it. But understanding the architecture is critical for DP-300 and for choosing the right service tier.

General Purpose vs Business Critical HA

SQL MI HA Architecture - GP vs BC
MI HA: GP vs BC
📦
General Purpose
Service Fabric manages compute
Remote Azure Premium Storage
1 node + 1 standby (no data)
Failover: ~30 seconds
No read replicas
SLA: 99.99%
👑
Business Critical
4-node Always On AG (local SSD)
1 primary + 3 secondaries
Sync commit on all replicas
Failover: ~5-10 seconds
1 free read-only replica
SLA: 99.99% (99.995% zone-redundant)

General Purpose Tier

AspectDetails
HA MechanismAzure Service Fabric manages compute. Storage is remote (Azure Premium Storage).
Replicas1 compute node + 1 standby (no data, spun up on failure)
Failover time~30 seconds (new node must attach remote storage)
Read replicas❌ None
StorageRemote — Azure Premium Storage (up to 16 TB)
IOPSLower — depends on storage tier, not local SSD
Use caseCost-sensitive workloads, dev/test, apps tolerant of brief failover
SLA99.99%

How General Purpose failover works:

  1. Azure detects compute node failure
  2. A new compute node is provisioned (from standby pool)
  3. The new node attaches to the existing remote storage
  4. Databases come online — no data loss (storage is durable)
  5. Client connections are redirected via the gateway
⚠️ Watch Out

General Purpose failover can take up to 30 seconds or longer because the standby node needs to start SQL Server and attach remote storage files. Applications should implement retry logic with exponential backoff.

Business Critical Tier

AspectDetails
HA MechanismBuilt-in Always On AG (similar technology as SQL Server AG)
Replicas1 primary + 3 secondary replicas (all with local data)
Failover time~5-10 seconds
Read replicas✅ 1 free read-only replica (built-in read scale-out)
StorageLocal SSD on each replica — much faster I/O
IOPSHigher — local NVMe/SSD storage
Use caseMission-critical workloads, OLTP, low latency requirements
SLA99.99%

How Business Critical failover works:

  1. Primary replica failure is detected
  2. One of the 3 secondaries is promoted to primary (all have local data copies)
  3. Failover is near-instant because data is already local
  4. The gateway redirects connections to the new primary
  5. A new secondary is provisioned in the background
Business Critical Failover Sequence
🚨
Failure Detected
Primary replica failure identified
Health monitoring triggers failover
Secondary Promoted
One of 3 secondaries becomes primary
Data already local on SSD — near-instant
🔄
Gateway Redirects
Connections routed to new primary
Failover completes in ~5-10 seconds
🛠️
New Secondary Provisioned
Background: replacement replica created
4-node AG restored to full health
🎯 Exam Focus

Business Critical = Always On AG under the hood (4-node AG with synchronous commit). You get one free read-only replica accessible via the connection string option ApplicationIntent=ReadOnly. This is the same technology as SQL Server AG but fully managed.

VNet Architecture

Unlike Azure SQL Database, MI is deployed inside your VNet:

MI VNet Architecture
🌐
Dedicated Subnet
MI requires its own subnet (/27 min, /26 recommended). Delegated to Microsoft.Sql/managedInstances.
🛡️
NSG + Route Table
Auto-created by Azure. Mandatory management routes. You can add additional rules.
🔗
Gateway
Connection gateway routes traffic to primary replica. Handles failover redirection transparently.

Subnet Requirements

RequirementDetails
Dedicated subnetMI requires its own subnet — no other resources allowed
Minimum size/27 (32 IPs) for a single MI, /26 recommended
NSGAuto-created and managed by Azure; you can add additional rules
Route TableAuto-created; includes mandatory routes for MI management
DelegationSubnet must be delegated to Microsoft.Sql/managedInstances
⚠️ Watch Out

Don't put anything else in the MI subnet! The subnet must be delegated exclusively to Managed Instance. Also, you cannot easily change the subnet size after MI is deployed — plan for growth (use /26 or /25).

Failover Groups for Managed Instance

For cross-region DR, MI uses Failover Groups (similar to Azure SQL DB):

# Create MI failover group
az sql instance-failover-group create \
--resource-group rg-dp300 \
--location eastus \
--mi mi-primary \
--partner-mi mi-secondary \
--partner-resource-group rg-dp300-dr \
--name fog-mi-dp300 \
--failover-policy Automatic \
--grace-period 60

# Connection endpoints (auto-generated):
# Read-Write: fog-mi-dp300.zone-id.database.windows.net
# Read-Only: fog-mi-dp300.secondary.zone-id.database.windows.net
FeatureDetails
ScopeAll user databases on the instance (all or nothing)
EndpointSingle DNS endpoint, no change on failover
FailoverAutomatic with configurable grace period
ReplicationAsynchronous geo-replication
Read-OnlySecondary is readable via .secondary endpoint
🎯 Exam Focus

MI Failover Groups replicate all user databases — you can't pick individual databases like in Azure SQL DB. The system databases (master, msdb) are NOT replicated. Logins and agent jobs must be manually synced or scripted.

The MI Link enables near real-time data replication between on-premises SQL Server and Managed Instance:

MI Link Feature
🏢
On-Prem SQL Server
SQL Server 2016+ as primary
Distributed AG technology
🔄
Near Real-Time Replication
Continuous log replication to MI
MI secondary is readable — offload reporting
✂️
Cutover
Break link when ready
MI becomes primary with minimal downtime

Use cases:

  • Offload reporting to MI while keeping on-prem as primary
  • Migration staging — replicate continuously, then cut over
  • Hybrid DR — MI as a disaster recovery target

Performance Comparison

MetricGeneral PurposeBusiness Critical
vCores4-804-128
Max storage16 TB16 TB
IOPS~500-7500 (storage-dependent)~30,000-400,000+ (local SSD)
Latency~5-10ms (remote storage)~1-2ms (local SSD)
Read replicas01 (free)
Failover time~30s~5-10s
Zone redundancy✅ (optional)✅ (optional)
In-memory OLTP
Price~40-60% lessHigher
🏢 Real-World DBA Note

Start with General Purpose for most workloads. Move to Business Critical only if you need: sub-2ms latency, very high IOPS, read replicas, or In-memory OLTP. The cost difference is significant — Business Critical is roughly 2-3x the price.


Flashcards

How many replicas does MI Business Critical have?
Click to reveal answer
4 total: 1 primary + 3 secondary replicas. All have local SSD copies. One secondary is exposed as a free read-only replica.
1 / 10

Quiz

Q1/6
0 correct
What HA technology powers SQL Managed Instance Business Critical tier?