Skip to main content

SQL Managed Instance — High Availability Architecture

SQL Managed Instance has built-in HA that works differently from SQL Server on VMs. You don't configure WSFC or load balancers — Microsoft handles it. But understanding the architecture is critical for DP-300 and for choosing the right service tier.

General Purpose vs Business Critical HA

SQL MI HA Architecture - GP vs BC
MI HA: GP vs BC
📦
General Purpose
Service Fabric manages compute
Remote Azure Premium Storage
1 node + 1 standby (no data)
Failover: ~30 seconds
No read replicas
SLA: 99.99%
👑
Business Critical
4-node Always On AG (local SSD)
1 primary + 3 secondaries
Sync commit on all replicas
Failover: ~5-10 seconds
1 free read-only replica
SLA: 99.99% (99.995% zone-redundant)

General Purpose Tier

AspectDetails
HA MechanismAzure Service Fabric manages compute. Storage is remote (Azure Premium Storage).
Replicas1 compute node + 1 standby (no data, spun up on failure)
Failover time~30 seconds (new node must attach remote storage)
Read replicas❌ None
StorageRemote — Azure Premium Storage (up to 16 TB)
IOPSLower — depends on storage tier, not local SSD
Use caseCost-sensitive workloads, dev/test, apps tolerant of brief failover
SLA99.99%

How General Purpose failover works:

  1. Azure detects compute node failure
  2. A new compute node is provisioned (from standby pool)
  3. The new node attaches to the existing remote storage
  4. Databases come online — no data loss (storage is durable)
  5. Client connections are redirected via the gateway
⚠️ Watch Out

General Purpose failover can take up to 30 seconds or longer because the standby node needs to start SQL Server and attach remote storage files. Applications should implement retry logic with exponential backoff.

Business Critical Tier

AspectDetails
HA MechanismBuilt-in Always On AG (similar technology as SQL Server AG)
Replicas1 primary + 3 secondary replicas (all with local data)
Failover time~5-10 seconds
Read replicas✅ 1 free read-only replica (built-in read scale-out)
StorageLocal SSD on each replica — much faster I/O
IOPSHigher — local NVMe/SSD storage
Use caseMission-critical workloads, OLTP, low latency requirements
SLA99.99%

How Business Critical failover works:

  1. Primary replica failure is detected
  2. One of the 3 secondaries is promoted to primary (all have local data copies)
  3. Failover is near-instant because data is already local
  4. The gateway redirects connections to the new primary
  5. A new secondary is provisioned in the background
Business Critical Failover Sequence
🚨
Failure Detected
Primary replica failure identified
Health monitoring triggers failover
Secondary Promoted
One of 3 secondaries becomes primary
Data already local on SSD — near-instant
🔄
Gateway Redirects
Connections routed to new primary
Failover completes in ~5-10 seconds
🛠️
New Secondary Provisioned
Background: replacement replica created
4-node AG restored to full health
🎯 Exam Focus

Business Critical = Always On AG under the hood (4-node AG with synchronous commit). You get one free read-only replica accessible via the connection string option ApplicationIntent=ReadOnly. This is the same technology as SQL Server AG but fully managed.

VNet Architecture

Unlike Azure SQL Database, MI is deployed inside your VNet:

MI VNet Architecture
🌐
Dedicated Subnet
MI requires its own subnet (/27 min, /26 recommended). Delegated to Microsoft.Sql/managedInstances.
🛡️
NSG + Route Table
Auto-created by Azure. Mandatory management routes. You can add additional rules.
🔗
Gateway
Connection gateway routes traffic to primary replica. Handles failover redirection transparently.

Subnet Requirements

RequirementDetails
Dedicated subnetMI requires its own subnet — no other resources allowed
Minimum size/27 (32 IPs) for a single MI, /26 recommended
NSGAuto-created and managed by Azure; you can add additional rules
Route TableAuto-created; includes mandatory routes for MI management
DelegationSubnet must be delegated to Microsoft.Sql/managedInstances
⚠️ Watch Out

Don't put anything else in the MI subnet! The subnet must be delegated exclusively to Managed Instance. Also, you cannot easily change the subnet size after MI is deployed — plan for growth (use /26 or /25).

Failover Groups for Managed Instance

For cross-region DR, MI uses Failover Groups (similar to Azure SQL DB):

# Create MI failover group
az sql instance-failover-group create \
--resource-group rg-dp300 \
--location eastus \
--mi mi-primary \
--partner-mi mi-secondary \
--partner-resource-group rg-dp300-dr \
--name fog-mi-dp300 \
--failover-policy Automatic \
--grace-period 60

# Connection endpoints (auto-generated):
# Read-Write: fog-mi-dp300.zone-id.database.windows.net
# Read-Only: fog-mi-dp300.secondary.zone-id.database.windows.net
FeatureDetails
ScopeAll user databases on the instance (all or nothing)
EndpointSingle DNS endpoint, no change on failover
FailoverAutomatic with configurable grace period
ReplicationAsynchronous geo-replication
Read-OnlySecondary is readable via .secondary endpoint
🎯 Exam Focus

MI Failover Groups replicate all user databases — you can't pick individual databases like in Azure SQL DB. The system databases (master, msdb) are NOT replicated. Logins and agent jobs must be manually synced or scripted.

The MI Link enables near real-time data replication between on-premises SQL Server and Managed Instance:

MI Link Feature
🏢
On-Prem SQL Server
SQL Server 2016+ as primary
Distributed AG technology
🔄
Near Real-Time Replication
Continuous log replication to MI
MI secondary is readable — offload reporting
✂️
Cutover
Break link when ready
MI becomes primary with minimal downtime

Use cases:

  • Offload reporting to MI while keeping on-prem as primary
  • Migration staging — replicate continuously, then cut over
  • Hybrid DR — MI as a disaster recovery target

Performance Comparison

MetricGeneral PurposeBusiness Critical
vCores4-804-128
Max storage16 TB16 TB
IOPS~500-7500 (storage-dependent)~30,000-400,000+ (local SSD)
Latency~5-10ms (remote storage)~1-2ms (local SSD)
Read replicas01 (free)
Failover time~30s~5-10s
Zone redundancy✅ (optional)✅ (optional)
In-memory OLTP
Price~40-60% lessHigher
🏢 Real-World DBA Note

Start with General Purpose for most workloads. Move to Business Critical only if you need: sub-2ms latency, very high IOPS, read replicas, or In-memory OLTP. The cost difference is significant — Business Critical is roughly 2-3x the price.


Anti-Patterns

  • "BC tier = our DR." BC's 4 replicas are all in one region (zone-redundant if enabled). A regional outage takes the whole instance. Pair BC with a Failover Group for DR.
  • "GP tier failover is too slow for production." ~30 s is fine for the vast majority of OLTP. Re-architecting to BC for the failover speed alone is usually a 2–3× cost increase for a few seconds.
  • "Use the BC read replica for everything." It's an eventually-consistent secondary. Reports tolerating < 5 s lag = fine. Anything reading-then-writing on the same data = race condition.
  • "MI Failover Group fails over one DB." No — MI FOG is instance-scoped, all-or-nothing. All user DBs fail over together. If apps need per-DB control, MI is the wrong target.
  • "Skip the subnet planning." MI subnets are dedicated, immutable, and minimum /27 (32 IPs). Resizing means redeploying the MI. Right-size at create.
  • "GP → BC scaling is online and free." Online yes, free no — the BC tier is 2–3× the cost. The scaling operation itself can take hours for large DBs because the storage layer changes (remote → local SSD).
⚠️ Watch Out

MI subnet provisioning is the longest single step in MI deployment (delegation, NSG, route table, ~4 hour first-instance provisioning). Subnets cannot be reused across MI instances if the MIs need different versions / features. Plan subnet sprawl carefully.


Migration Between Tiers / HA Postures

From → ToPathCost
GP → BCOnline tier scaleLong (hours for big DBs) due to storage layer rebuild; 2–3× cost
BC → GPOnline tier scaleSame long operation reversed; saves cost; loses read replica + In-Memory OLTP
Single-region MI → MI + Failover GroupAdd secondary MI in paired region; create FOGDoubles MI cost; instance-level FOG (all DBs)
Non-zone-redundant → Zone-redundantConfigure at instance level (BC supports it; GP rolling out)Slight latency increase; gain AZ resilience
MI → SQL VMBackup/restore or AG seeding to a new VMDrop PaaS; gain CmdExec/CLR/SSIS Server-side execution
SQL VM → MIMI Link (online) or LRS (offline) or backup/restoreLose host control; gain managed lifecycle
MI Link primary on-prem → cutover to MIPromote MI side; redirect appsOnline during sync; cutover window for app reconfig
MI standard → NextGen General PurposeIn-place via service tier change (where available)Larger storage ceiling (128 TB), better IOPS

The expensive move is GP ↔ BC — not the operation, but the price delta.


Real Scenarios

  1. ERP migration from on-prem SQL 2019 with SQL Agent + CLR + SSISMI BC, zone-redundant, FOG to paired region. Driver: near-100 % compatibility + HA + DR. Trade-off: BC pricing + 4-hour first MI provisioning + permanent subnet.
  2. Internal app, OLTP, modest IOPSMI GP, zone-redundant, no FOG. Driver: cost-effective PaaS, ~30 s failover acceptable. Trade-off: regional outage = downtime, RPO from PITR.
  3. Read-heavy reporting on top of OLTPMI BC, point reports at the readable secondary. Driver: offload reads with no app rewrite. Trade-off: BC cost; replicas eventually consistent.
  4. Multi-region active-passive ISV deploymentMI BC + FOG in paired region. Driver: automatic failover + single endpoint. Trade-off: 2× MI cost; all DBs fail over together.
  5. Lift-and-shift of a 64 TB legacy DBMI NextGen GP (standard MI tops at 16 TB on most SKUs). Driver: only MI variant that fits. Trade-off: NextGen GP region availability still rolling out.

Flashcards

How many replicas does MI Business Critical have?
Click to reveal answer
4 total: 1 primary + 3 secondary replicas. All have local SSD copies. One secondary is exposed as a free read-only replica.
1 / 10

Quiz

Q1/6
0 correct
What HA technology powers SQL Managed Instance Business Critical tier?