SQL Managed Instance — High Availability Architecture

SQL Managed Instance has built-in HA that works differently from SQL Server on VMs. You don't configure WSFC or load balancers — Microsoft handles it. But understanding the architecture is critical for DP-300 and for choosing the right service tier.

General Purpose vs Business Critical HA

MI HA: GP vs BC

📦

General Purpose

Service Fabric manages compute

Remote Azure Premium Storage

1 node + 1 standby (no data)

Failover: ~30 seconds

No read replicas

SLA: 99.99%

👑

Business Critical

4-node Always On AG (local SSD)

1 primary + 3 secondaries

Sync commit on all replicas

Failover: ~5-10 seconds

1 free read-only replica

SLA: 99.99% (99.995% zone-redundant)

General Purpose Tier

Aspect	Details
HA Mechanism	Azure Service Fabric manages compute. Storage is remote (Azure Premium Storage).
Replicas	1 compute node + 1 standby (no data, spun up on failure)
Failover time	~30 seconds (new node must attach remote storage)
Read replicas	❌ None
Storage	Remote — Azure Premium Storage (up to 16 TB)
IOPS	Lower — depends on storage tier, not local SSD
Use case	Cost-sensitive workloads, dev/test, apps tolerant of brief failover
SLA	99.99%

How General Purpose failover works:

Azure detects compute node failure
A new compute node is provisioned (from standby pool)
The new node attaches to the existing remote storage
Databases come online — no data loss (storage is durable)
Client connections are redirected via the gateway

⚠️ Watch Out

General Purpose failover can take up to 30 seconds or longer because the standby node needs to start SQL Server and attach remote storage files. Applications should implement retry logic with exponential backoff.

Business Critical Tier

Aspect	Details
HA Mechanism	Built-in Always On AG (similar technology as SQL Server AG)
Replicas	1 primary + 3 secondary replicas (all with local data)
Failover time	~5-10 seconds
Read replicas	✅ 1 free read-only replica (built-in read scale-out)
Storage	Local SSD on each replica — much faster I/O
IOPS	Higher — local NVMe/SSD storage
Use case	Mission-critical workloads, OLTP, low latency requirements
SLA	99.99%

How Business Critical failover works:

Primary replica failure is detected
One of the 3 secondaries is promoted to primary (all have local data copies)
Failover is near-instant because data is already local
The gateway redirects connections to the new primary
A new secondary is provisioned in the background

Business Critical Failover Sequence

🚨

Failure Detected

Primary replica failure identified

Health monitoring triggers failover

⚡

Secondary Promoted

One of 3 secondaries becomes primary

Data already local on SSD — near-instant

🔄

Gateway Redirects

Connections routed to new primary

Failover completes in ~5-10 seconds

🛠️

New Secondary Provisioned

Background: replacement replica created

4-node AG restored to full health

🎯 Exam Focus

Business Critical = Always On AG under the hood (4-node AG with synchronous commit). You get one free read-only replica accessible via the connection string option ApplicationIntent=ReadOnly. This is the same technology as SQL Server AG but fully managed.

VNet Architecture

Unlike Azure SQL Database, MI is deployed inside your VNet:

MI VNet Architecture

🌐

Dedicated Subnet

MI requires its own subnet (/27 min, /26 recommended). Delegated to Microsoft.Sql/managedInstances.

🛡️

NSG + Route Table

Auto-created by Azure. Mandatory management routes. You can add additional rules.

🔗

Gateway

Connection gateway routes traffic to primary replica. Handles failover redirection transparently.

Subnet Requirements

Requirement	Details
Dedicated subnet	MI requires its own subnet — no other resources allowed
Minimum size	/27 (32 IPs) for a single MI, /26 recommended
NSG	Auto-created and managed by Azure; you can add additional rules
Route Table	Auto-created; includes mandatory routes for MI management
Delegation	Subnet must be delegated to `Microsoft.Sql/managedInstances`

⚠️ Watch Out

Don't put anything else in the MI subnet! The subnet must be delegated exclusively to Managed Instance. Also, you cannot easily change the subnet size after MI is deployed — plan for growth (use /26 or /25).

Failover Groups for Managed Instance

For cross-region DR, MI uses Failover Groups (similar to Azure SQL DB):

# Create MI failover group
az sql instance-failover-group create \
  --resource-group rg-dp300 \
  --location eastus \
  --mi mi-primary \
  --partner-mi mi-secondary \
  --partner-resource-group rg-dp300-dr \
  --name fog-mi-dp300 \
  --failover-policy Automatic \
  --grace-period 60

# Connection endpoints (auto-generated):
# Read-Write: fog-mi-dp300.zone-id.database.windows.net
# Read-Only:  fog-mi-dp300.secondary.zone-id.database.windows.net

Feature	Details
Scope	All user databases on the instance (all or nothing)
Endpoint	Single DNS endpoint, no change on failover
Failover	Automatic with configurable grace period
Replication	Asynchronous geo-replication
Read-Only	Secondary is readable via `.secondary` endpoint

🎯 Exam Focus

MI Failover Groups replicate all user databases — you can't pick individual databases like in Azure SQL DB. The system databases (master, msdb) are NOT replicated. Logins and agent jobs must be manually synced or scripted.

MI Link Feature

The MI Link enables near real-time data replication between on-premises SQL Server and Managed Instance:

MI Link Feature

🏢

On-Prem SQL Server

SQL Server 2016+ as primary

Distributed AG technology

🔄

Near Real-Time Replication

Continuous log replication to MI

MI secondary is readable — offload reporting

✂️

Cutover

Break link when ready

MI becomes primary with minimal downtime

Use cases:

Offload reporting to MI while keeping on-prem as primary
Migration staging — replicate continuously, then cut over
Hybrid DR — MI as a disaster recovery target

Performance Comparison

Metric	General Purpose	Business Critical
vCores	4-80	4-128
Max storage	16 TB	16 TB
IOPS	~500-7500 (storage-dependent)	~30,000-400,000+ (local SSD)
Latency	~5-10ms (remote storage)	~1-2ms (local SSD)
Read replicas	0	1 (free)
Failover time	~30s	~5-10s
Zone redundancy	✅ (optional)	✅ (optional)
In-memory OLTP	❌	✅
Price	~40-60% less	Higher

🏢 Real-World DBA Note

Start with General Purpose for most workloads. Move to Business Critical only if you need: sub-2ms latency, very high IOPS, read replicas, or In-memory OLTP. The cost difference is significant — Business Critical is roughly 2-3x the price.

Anti-Patterns

"BC tier = our DR." BC's 4 replicas are all in one region (zone-redundant if enabled). A regional outage takes the whole instance. Pair BC with a Failover Group for DR.
"GP tier failover is too slow for production." ~30 s is fine for the vast majority of OLTP. Re-architecting to BC for the failover speed alone is usually a 2–3× cost increase for a few seconds.
"Use the BC read replica for everything." It's an eventually-consistent secondary. Reports tolerating < 5 s lag = fine. Anything reading-then-writing on the same data = race condition.
"MI Failover Group fails over one DB." No — MI FOG is instance-scoped, all-or-nothing. All user DBs fail over together. If apps need per-DB control, MI is the wrong target.
"Skip the subnet planning." MI subnets are dedicated, immutable, and minimum /27 (32 IPs). Resizing means redeploying the MI. Right-size at create.
"GP → BC scaling is online and free." Online yes, free no — the BC tier is 2–3× the cost. The scaling operation itself can take hours for large DBs because the storage layer changes (remote → local SSD).

⚠️ Watch Out

MI subnet provisioning is the longest single step in MI deployment (delegation, NSG, route table, ~4 hour first-instance provisioning). Subnets cannot be reused across MI instances if the MIs need different versions / features. Plan subnet sprawl carefully.

Migration Between Tiers / HA Postures

From → To	Path	Cost
GP → BC	Online tier scale	Long (hours for big DBs) due to storage layer rebuild; 2–3× cost
BC → GP	Online tier scale	Same long operation reversed; saves cost; loses read replica + In-Memory OLTP
Single-region MI → MI + Failover Group	Add secondary MI in paired region; create FOG	Doubles MI cost; instance-level FOG (all DBs)
Non-zone-redundant → Zone-redundant	Configure at instance level (BC supports it; GP rolling out)	Slight latency increase; gain AZ resilience
MI → SQL VM	Backup/restore or AG seeding to a new VM	Drop PaaS; gain CmdExec/CLR/SSIS Server-side execution
SQL VM → MI	MI Link (online) or LRS (offline) or backup/restore	Lose host control; gain managed lifecycle
MI Link primary on-prem → cutover to MI	Promote MI side; redirect apps	Online during sync; cutover window for app reconfig
MI standard → NextGen General Purpose	In-place via service tier change (where available)	Larger storage ceiling (128 TB), better IOPS

The expensive move is GP ↔ BC — not the operation, but the price delta.

Real Scenarios

ERP migration from on-prem SQL 2019 with SQL Agent + CLR + SSIS → MI BC, zone-redundant, FOG to paired region. Driver: near-100 % compatibility + HA + DR. Trade-off: BC pricing + 4-hour first MI provisioning + permanent subnet.
Internal app, OLTP, modest IOPS → MI GP, zone-redundant, no FOG. Driver: cost-effective PaaS, ~30 s failover acceptable. Trade-off: regional outage = downtime, RPO from PITR.
Read-heavy reporting on top of OLTP → MI BC, point reports at the readable secondary. Driver: offload reads with no app rewrite. Trade-off: BC cost; replicas eventually consistent.
Multi-region active-passive ISV deployment → MI BC + FOG in paired region. Driver: automatic failover + single endpoint. Trade-off: 2× MI cost; all DBs fail over together.
Lift-and-shift of a 64 TB legacy DB → MI NextGen GP (standard MI tops at 16 TB on most SKUs). Driver: only MI variant that fits. Trade-off: NextGen GP region availability still rolling out.

Flashcards

How many replicas does MI Business Critical have?

Click to reveal answer

4 total: 1 primary + 3 secondary replicas. All have local SSD copies. One secondary is exposed as a free read-only replica.

1 / 10

Quiz

Q1/6

0 correct

What HA technology powers SQL Managed Instance Business Critical tier?

General Purpose vs Business Critical HA​

General Purpose Tier​

Business Critical Tier​

VNet Architecture​

Subnet Requirements​

Failover Groups for Managed Instance​

MI Link Feature​

Performance Comparison​

Anti-Patterns​

Migration Between Tiers / HA Postures​

Real Scenarios​

Flashcards​

Quiz​

General Purpose vs Business Critical HA

General Purpose Tier

Business Critical Tier

VNet Architecture

Subnet Requirements

Failover Groups for Managed Instance

MI Link Feature

Performance Comparison

Anti-Patterns

Migration Between Tiers / HA Postures

Real Scenarios

Flashcards

Quiz