SQL Server on Azure VM — Configuration Best Practices
When you deploy SQL Server on Azure VMs, you are responsible for configuring the operating system, storage, SQL Server settings, and performance tuning. This is where your Oracle DBA skills translate directly — but Azure has specific best practices that differ from on-premises.
VM Size Selection
Recommended VM Series for SQL Server
| VM Series | Best For | Key Feature | vCPUs | Memory |
|---|---|---|---|---|
| Ebdsv5 | OLTP workloads | Best price/perf, Premium SSD + Ultra | 2-104 | 16-672 GB |
| Edsv5 | General purpose | Balanced compute/memory | 2-104 | 16-672 GB |
| M-series | Very large databases | Up to 4 TB RAM | 8-416 | 218 GB-12 TB |
| Lsv3 | Data warehousing | High local NVMe storage | 8-80 | 64-640 GB |
DP-300 key rule: Always choose memory-optimized VMs (E-series or M-series) for SQL Server. Compute-optimized (F-series) or general-purpose (D-series) are NOT recommended for production SQL.
Oracle DBA parallel: This is like choosing your Exadata configuration — but instead of fixed hardware, you pick a VM size. The key difference: you can resize Azure VMs (with a brief restart), so start smaller and scale up based on workload.
Storage Architecture — The Most Critical Decision
Storage configuration is the #1 performance factor for SQL Server on Azure VMs. Getting this wrong causes 80% of performance issues.
Disk Types
| Disk Type | Max IOPS | Max Throughput | Latency | Best For |
|---|---|---|---|---|
| Ultra Disk | 160,000 | 4,000 MB/s | < 1ms | Mission-critical OLTP |
| Premium SSD v2 | 80,000 | 1,200 MB/s | < 1ms | High-performance, flexible |
| Premium SSD | 20,000 | 900 MB/s | ~1-2ms | Most production workloads |
| Standard SSD | 6,000 | 750 MB/s | ~5ms | Dev/test only |
| Standard HDD | 2,000 | 500 MB/s | ~10ms | Never for SQL Server |
The Golden Rules of SQL VM Storage
Recommended Disk Layout
| Drive | Purpose | Disk Type | Caching | Size Guidance |
|---|---|---|---|---|
| C:\ | OS only | Premium SSD | Read-only | 128 GB |
| D:\ | tempdb | Local SSD (ephemeral) | N/A | VM-provided |
| F:\ | Data files (.mdf, .ndf) | Premium SSD or Ultra | Read-only | Based on DB size |
| G:\ | Log files (.ldf) | Premium SSD or Ultra | None | Based on log throughput |
| H:\ | Backups | Standard SSD | None | 2-3x DB size |
Critical: Log file disks must have caching set to None. Read-only or read-write caching on transaction log disks can cause data corruption during a crash because cached writes may not be flushed. This is one of the most common Azure SQL VM misconfigurations.
Oracle DBA parallel: This is like your ASM disk group layout — you wouldn't put DATA and FRA on the same disk group. Same principle: separate data, log, and temp I/O paths. The D:\ drive for tempdb is like using local flash for temp tablespace.
Storage Caching Rules
| File Type | Caching Setting | Why |
|---|---|---|
| Data files (.mdf/.ndf) | Read-only | Speeds up read operations, data pages cached in host memory |
| Log files (.ldf) | None | Write-ahead logging requires guaranteed sequential writes — caching risks corruption |
| tempdb | N/A (local SSD) | D:\ drive has no caching option — it's direct local storage |
| Backups | None | Sequential write workload, caching adds no benefit |
Disk Striping for IOPS
When a single Premium SSD doesn't provide enough IOPS, use Storage Spaces to stripe multiple disks:
| Config | IOPS | Throughput | Use Case |
|---|---|---|---|
| 1x P30 (1 TB) | 5,000 | 200 MB/s | Small databases |
| 4x P30 striped | 20,000 | 800 MB/s | Medium OLTP |
| 8x P30 striped | 40,000 | 1,600 MB/s | Large OLTP |
| Ultra Disk | Up to 160,000 | 4,000 MB/s | Mission-critical (no striping needed) |
Storage Spaces stripe size: Use 64 KB interleave/stripe size for SQL Server data files (matches SQL Server extent size). This is a commonly tested detail on DP-300.
SQL Server Configuration Settings
tempdb Configuration
| Setting | Recommendation | Why |
|---|---|---|
| Number of files | 1 per vCPU (up to 8) | Reduces allocation contention (PFS/GAM/SGAM) |
| File location | D:\ (local SSD) | Fastest I/O, no persistence needed |
| Initial size | Equal across all files | Avoids proportional-fill skew |
| Auto-growth | Equal increment (64 MB or 128 MB) | Keeps files balanced |
| Trace Flag 1118 | Not needed in SQL 2016+ | Mixed extents disabled by default |
Oracle DBA parallel: tempdb files = Oracle temp tablespace datafiles. The "1 per CPU up to 8" rule solves the same contention problem as Oracle's locally managed temp tablespace with multiple temp files. The PFS/GAM contention is similar to Oracle's free space management bitmap contention.
MAXDOP (Max Degree of Parallelism)
| Scenario | Recommended MAXDOP |
|---|---|
| OLTP (high concurrency) | 1 (or 2-4) |
| Data warehouse / reporting | 0 (unlimited) or 8 |
| Mixed workload | 4-8 (never exceed logical CPU count) |
| General starting point | Min(8, logical CPUs) |
Cost Threshold for Parallelism: Increase from the default of 5 to 50 for OLTP workloads. This prevents small queries from going parallel unnecessarily, reducing CXPACKET waits.
Memory Configuration
| Setting | Recommendation | Why |
|---|---|---|
| Max Server Memory | Total RAM - 4 GB (for OS) | Leave headroom for OS and other processes |
| Min Server Memory | 0 or fixed based on workload | Usually leave at 0 |
| Lock Pages in Memory | Enable for production | Prevents Windows from paging SQL buffer pool to disk |
| Instant File Initialization | Enable via SQL service account | Skips zeroing new data files — massive speed improvement |
Instant File Initialization (IFI) only works for data files, NOT log files. Log files are always zero-initialized for recovery safety. Grant the SQL Server service account the SE_MANAGE_VOLUME_NAME privilege (Perform Volume Maintenance Tasks in Local Security Policy).
Important Trace Flags
| Flag | Purpose | Still Needed? |
|---|---|---|
| 1118 | Disable mixed extent allocations | ❌ Default in SQL 2016+ |
| 3226 | Suppress successful backup messages in error log | ✅ Recommended |
| 1117 | Auto-grow all files in filegroup equally | ❌ Default in SQL 2016+ |
| 2371 | Lower threshold for auto-update statistics | ❌ Default in SQL 2016+ |
| 460 | Show actual truncation length in truncation warnings | ✅ Helpful for debugging |
Database Settings
| Setting | Recommendation | Why |
|---|---|---|
| Compatibility Level | Latest (160 for SQL 2022) | Enables IQP, CE improvements |
| Recovery Model | FULL (for production) | Required for AG, point-in-time restore |
| Auto-Create Statistics | ON | Query optimizer needs statistics |
| Auto-Update Statistics | ON | Keep statistics fresh |
| Auto-Update Stats Async | ON (for OLTP) | Prevents query compilation blocking |
| Page Verify | CHECKSUM | Detects I/O corruption |
| Query Store | ON | Flight recorder for query perf |
Networking Best Practices
| Setting | Recommendation |
|---|---|
| Accelerated Networking | Enable on all SQL VMs (reduces latency 2-5x) |
| Proximity Placement Groups | Place AG replicas in same placement group |
| Network bandwidth | Choose VM size with sufficient bandwidth for AG sync |
Maintenance Tasks
| Task | Frequency | Method |
|---|---|---|
| Index maintenance | Weekly | Ola Hallengren scripts or custom |
| Statistics update | Daily or after large loads | sp_updatestats or per-table UPDATE STATISTICS |
| Integrity check | Weekly | DBCC CHECKDB (schedule during low activity) |
| Backup verification | Weekly | RESTORE VERIFYONLY |
| Error log review | Daily | Check for I/O errors, login failures |
Oracle DBA parallel: Index maintenance = ALTER INDEX REBUILD/COALESCE. Statistics update = DBMS_STATS.GATHER_TABLE_STATS. Integrity check = DBMS_REPAIR + ANALYZE TABLE VALIDATE STRUCTURE. Same disciplines, different syntax.
SQL VM Resource Provider
When you register your SQL VM with the SQL IaaS Agent Extension, you get:
| Feature | What It Does |
|---|---|
| Automated Backup | Configurable backups to Azure Storage (schedule, retention, encryption) |
| Automated Patching | Auto-apply Windows + SQL patches in a maintenance window |
| Azure Key Vault Integration | TDE with customer-managed keys from Key Vault |
| Disk Utilization | Portal shows storage IOPS/throughput metrics |
| Flexible Licensing | Switch between PAYG and BYOL (Azure Hybrid Benefit) |
| Storage Configuration | Extend drives and configure best practices from portal |
IaaS Agent extension — mode comparison
| Mode | Portal manageability | Automated Backup | Automated Patching | Compatible with Azure Update Manager |
|---|---|---|---|---|
| Full | ✅ Full | ✅ | ✅ (legacy "Automated Patching") | ❌ Conflicts — must switch off first |
| Lightweight | ✅ Inventory + license type only | ❌ | ❌ | ✅ Use this for AUM |
| NoAgent (legacy) | ❌ | ❌ | ❌ | ✅ but no portal visibility |
SQL VM patching — choose ONE mechanism, not two:
- Automated Patching (Full extension mode) → SQL + Windows patches in a maintenance window managed by the SQL IaaS Agent.
- Azure Update Manager (AUM) → centralized cross-fleet patching managed at subscription scope.
- Manual → you own everything.
You cannot run both Automated Patching and AUM. Common ordering trap: to move a SQL VM from Automated Patching to AUM, disable Automated Patching first (or switch the IaaS extension from Full → Lightweight). Otherwise AUM schedules silently no-op while the legacy agent reboots the VM on a different cadence.
Switching extension mode is one-way friendly only: Lightweight → Full requires uninstalling and re-registering the extension (and that briefly restarts the SQL service). Plan the change in a window. NoAgent is legacy — for any new VM register with at least Lightweight so the portal shows the SQL workload at all.
Checklist: Production SQL VM Configuration
| Category | Item | ✅ |
|---|---|---|
| VM | Memory-optimized series (E or M) | ☐ |
| VM | Accelerated Networking enabled | ☐ |
| Storage | Data files on separate Premium SSD/Ultra (caching: Read-only) | ☐ |
| Storage | Log files on separate Premium SSD/Ultra (caching: None) | ☐ |
| Storage | tempdb on D:\ (local SSD) | ☐ |
| Storage | 64 KB stripe size for Storage Spaces | ☐ |
| SQL | Max Server Memory set (Total - 4 GB) | ☐ |
| SQL | MAXDOP configured per workload type | ☐ |
| SQL | Cost Threshold for Parallelism = 50 | ☐ |
| SQL | Lock Pages in Memory enabled | ☐ |
| SQL | Instant File Initialization enabled | ☐ |
| SQL | tempdb: 1 file per vCPU (up to 8) | ☐ |
| SQL | Query Store enabled | ☐ |
| SQL | Recovery Model = FULL | ☐ |
| SQL | Page Verify = CHECKSUM | ☐ |
| Maintenance | Ola Hallengren scripts installed | ☐ |
| Registration | SQL IaaS Agent Extension (Full mode) | ☐ |
Anti-Patterns
- "Use the OS C: drive for data files — it's already there." C: has read/write caching for the OS. SQL data files on C: corrupt during host crashes. Always use a separate Premium/Ultra data disk with ReadOnly cache (data) or None (logs).
- "Default 64 KB NTFS allocation works fine." It does, but SQL extents are 64 KB — misaligned blocks add IO amplification. Format with
/A:64Kfor SQL data + log volumes. - "Premium SSD v2 — better than Premium SSD, switch everything." PSv2 has no caching support. For log files (no cache anyway) it's a great cost win. For data files that benefit from ReadOnly cache, plain Premium SSD is often faster and cheaper.
- "Skip SQL IaaS Extension Full mode." Lightweight mode = no automated backup, no automated patching, no portal-managed AKV integration. Full mode is free; the only cost is restarting SQL once at install. Always run Full.
- "Use the local D:\ drive for everything except tempdb." D:\ is ephemeral — wiped on stop/deallocate. tempdb is the only legitimate use. User DBs on D:\ = data loss on the next maintenance reboot.
- "Lock pages in memory not needed in Azure." Same OS, same memory pressure. Grant
Lock pages in memoryto the SQL service account on every prod VM.
Azure Backup for SQL VM and SQL native backup-to-URL serve different needs. Azure Backup is policy + GUI + LTR via Recovery Services Vault. Backup-to-URL is T-SQL BACKUP DATABASE ... TO URL for ad-hoc / scripted backups. Don't run both against the same DB without coordination — you fragment the log chain.
Migration Between Configurations
| From → To | Path | Cost |
|---|---|---|
| Standard SSD → Premium SSD | Detach + change SKU + reattach | Brief downtime per disk; significant IOPS gain |
| Premium SSD → Ultra Disk | Recreate VM with Ultra-capable size + region | New VM; data migrated via backup/restore; only worth it for sub-ms latency |
| Single disks → Storage Spaces stripe | Add disks, configure SS pool, migrate files | Offline migration of data files; gain aggregate IOPS / throughput |
| SQL VM with PAYG license → AHB | Switch via Azure portal SQL VM blade or Set-AzSqlVM -LicenseType AHUB | Online; instant ~40–55 % compute saving |
| SQL IaaS Lightweight → Full mode | Re-register extension; SQL service restart | One restart; gain automated patching/backup |
| Azure Hybrid Benefit on → PAYG | Reverse via portal | Online; only do this if you give back the SA license |
| SQL VM → Azure SQL MI | MI Link or LRS or backup/restore | Online via MI Link; offline via LRS/backup; gain PaaS, lose CmdExec/CLR features |
The expensive moves are storage layout changes (data file relocation) and VM size changes that require a different family (e.g., Ebsv5 → Ebdsv5).
Real Scenarios
- Lift-and-shift legacy ERP, 2 TB, mostly read-heavy → E16bds_v5 + 2× P30 striped (data, ReadOnly cache) + 1× P20 (log, None cache) + D:\ tempdb + AHB. Driver: balanced compute, predictable IOPS. Trade-off: striping adds restore complexity.
- High-throughput OLTP, sub-ms log latency required → M-series + Ultra Disk for log + Premium SSD for data. Driver: only Ultra meets the latency SLA. Trade-off: Ultra Disk pricing + region restriction.
- Cost-optimized dev/test, can stop nightly → Bs-series VM, Standard SSD data + log, Auto-Shutdown 8 PM. Driver: 60–70 % cost saving. Trade-off accepted: cold-start + lower IOPS during burst.
- DR replica in paired region → Same VM family, ZRS-managed disks, Always On AG async replica. Driver: regional resiliency. Trade-off: 2× compute + cross-region egress on log shipping.
- ISV product certified on "SQL Server", 4 TB DB, FILESTREAM → SQL VM (no PaaS option due to FILESTREAM). Driver: certification + feature support. Trade-off: customer owns OS patching, AG cluster, LB.