Skip to main content

SQL Server Engine & Memory Architecture

Understanding how SQL Server manages memory, scheduling, and the buffer pool is foundational for performance tuning and troubleshooting.

SQL Server Memory Architecture

SQLOS - The Operating System Within

SQL Server runs its own lightweight operating system layer called SQLOS (SQL Server Operating System). It sits between the SQL Server engine and Windows, managing:

SQLOS Components
*
Scheduler
- One per logical CPU
- Manages worker threads
- Cooperative (non-preemptive) scheduling
- Visible via sys.dm_os_schedulers
*
Workers & Tasks
- Worker = thread assigned to a scheduler
- Task = unit of work (query, batch)
- Max worker threads configurable
- Default: 0 (auto-calculated)
*
Wait Stats
- Core diagnostics tool
- Every wait is categorized
- sys.dm_os_wait_stats
- Key: PAGEIOLATCH, CXPACKET, LCK_M_*

How scheduling works:

  1. SQL Server creates one scheduler per logical CPU (visible in sys.dm_os_schedulers)
  2. Each scheduler runs one worker at a time (cooperative/non-preemptive)
  3. Workers voluntarily yield the CPU - SQL Server doesn't preempt them
  4. When a worker needs to wait (I/O, lock, memory), it records a wait type and yields
🔀 Oracle → Azure

Oracle parallel: SQLOS schedulers are similar to Oracle's process model with PMON, SMON, etc. - but SQL Server uses threads instead of processes, which is more memory-efficient. Oracle's V$SESSION_WAIT maps conceptually to SQL Server's sys.dm_os_wait_stats.

Memory Architecture

SQL Server Memory Regions
*
Buffer Pool
- Largest memory consumer
- Caches data pages (8KB each)
- Managed via LRU algorithm
- Goal: keep hot data in memory
- Monitor: sys.dm_os_buffer_descriptors
*
Plan Cache
- Stores compiled execution plans
- Avoids recompilation overhead
- Can grow large (plan bloat)
- Monitor: sys.dm_exec_cached_plans
- Clear: DBCC FREEPROCCACHE
*
Lock Memory
- Tracks all active locks
- Allocated from buffer pool
- Escalation: row -> page -> table
- Monitor: sys.dm_tran_locks
*
Other Memory Clerks
- CLR memory (for .NET objects)
- Sort/hash memory (query workspace)
- Network buffers
- Monitor: sys.dm_os_memory_clerks

Memory Configuration

SettingPurposeRecommendation
Max Server MemoryUpper bound for buffer poolTotal RAM - OS - other apps (leave 4-8 GB for OS)
Min Server MemoryLower bound - SQL won't release below thisSet to 50-70% of max in shared environments
Lock Pages in MemoryPrevents Windows from paging SQL memoryAlways enable on production servers
Best Practice

Always set Max Server Memory. By default it's 2,147,483,647 MB (~2 TB). SQL Server will consume all available memory and not release it voluntarily. On a 64 GB server, set it to ~56 GB (leaving ~8 GB for OS and other processes). Reference: Microsoft Learn: Server Memory Configuration

Buffer Pool Deep Dive

The buffer pool is where SQL Server caches 8 KB data pages read from disk. Every query reads from the buffer pool - if the page isn't there, SQL Server reads it from disk (a physical read) and caches it.

How a Page Read Works
*
1. Check Buffer Pool
Query needs page 1:234:5
Hash lookup in buffer pool
If found -> logical read (fast)
*
2. Physical Read
Page not in memory
Issue async I/O to disk
Wait type: PAGEIOLATCH_SH
*
3. Cache the Page
Place page in buffer pool
Mark as clean (not modified)
Available for future reads
*
4. Dirty Pages
Modifications make page "dirty"
Checkpoint writes dirty pages to disk
Lazy writer also flushes under memory pressure

Key Buffer Pool Metrics

MetricWhat It Tells YouTarget
Page Life Expectancy (PLE)How long a page stays in buffer pool (seconds)> 300 (but depends on buffer pool size)
Buffer Cache Hit Ratio% of reads served from memory> 99% for OLTP
Checkpoint Pages/secDirty pages written per secondSteady, not spiking
Page Reads/secPhysical reads from diskLower is better
Lazy Writes/secEvictions due to memory pressureShould be near 0
🎯 Exam Focus

PLE (Page Life Expectancy) is the most commonly cited metric for buffer pool health. The old rule of thumb was "PLE > 300 seconds." A common community guideline (not official Microsoft documentation) suggests: PLE should be > 300 x (buffer pool size in GB / 4). So for a 32 GB buffer pool, target PLE > 2,400 seconds. Use this as a rough benchmark, not an absolute rule.

Checkpoint & Recovery

SQL Server uses Write-Ahead Logging (WAL) - every change is written to the transaction log before it's written to the data file.

Checkpoint Process
*
Transaction Log
- Every change logged first (WAL)
- Sequential writes (fast)
- Durability guarantee
- Recovery uses log to redo/undo
*
Checkpoint
- Writes all dirty pages to disk
- Automatic (~1 minute interval)
- Recovery interval setting controls frequency
- Reduces recovery time on crash
*
Recovery Process
- On startup: REDO committed transactions
- Then: UNDO uncommitted transactions
- Uses the transaction log
- More dirty pages = longer recovery
🔀 Oracle → Azure

Oracle parallel: SQL Server's checkpoint is similar to Oracle's DBWR (database writer) process. The transaction log is equivalent to Oracle's redo log. SQL Server's recovery process maps to Oracle's instance recovery with redo (roll forward) and undo (roll back) phases.


Flashcards

What is SQLOS?
Click to reveal answer
SQL Server Operating System - a lightweight OS layer that manages scheduling, memory, and I/O between SQL Server and Windows. Uses cooperative (non-preemptive) scheduling with one scheduler per logical CPU.
1 / 8