Index Design & Query Tuning Deep Dive

How SQL Server Finds Data — The Foundation

Data Access Methods

🐢

Table Scan

Reads every page. O(n) cost. No useful index exists for the query predicate.

🎯

Index Seek

B-tree navigation: 3-4 page reads. O(log n). Index matches the WHERE clause.

✅

Covering Index

Seek + all columns in index (INCLUDE). Zero Key Lookups. Best possible performance.

Index Architecture — B-Tree Structure

B-Tree Index Structure

🏛️

Root Node

Entry point. Contains key ranges pointing to intermediate pages. Single page.

📎

Intermediate Levels

Navigate key ranges to narrow down to the correct leaf page. 1-2 levels typical.

🍃

Leaf Level

Contains actual data (clustered) or row pointers (non-clustered). Doubly-linked list for range scans.

Key insight: B-tree depth is typically 3-4 levels even for millions of rows. An Index Seek reads 3-4 pages (microseconds) vs a Table Scan reading thousands of pages.

The Key Lookup Problem

The #1 performance killer after missing indexes.

The Key Lookup Problem

🔍

Index Seek on City

NC index on City finds 500 matching rows

Fast: only 3-4 page reads

⚠️

500 Key Lookups

Name and Email not in index

Each row requires a random I/O back to clustered index

✅

Fix: INCLUDE Columns

CREATE INDEX IX_City ON Customers(City) INCLUDE (Name, Email)

Covering index eliminates all lookups

The problem: Index on City finds 500 rows quickly, but Name and Email aren't in the index. SQL Server must do 500 individual lookups back to the clustered index — each one is a random I/O.

The solution: Add INCLUDE columns to the index:

-- Before: Key Lookup for every row
CREATE INDEX IX_City ON Customers(City);

-- After: Covering index — no Key Lookup needed
CREATE INDEX IX_City ON Customers(City) INCLUDE (Name, Email);

🎯 Exam Focus

INCLUDE columns are stored ONLY at the leaf level of the index (not in the B-tree navigation structure). They make the index "cover" the query without making the B-tree wider. If you see Key Lookups in an execution plan → add the missing columns to INCLUDE.

SARGable vs Non-SARGable Queries

SARGable (Search ARGument able) = the query can use an index seek. Non-SARGable = forced to scan.

Pattern	SARGable?	Why
`WHERE LastName = 'Smith'`	✅ Yes	Direct equality match
`WHERE LastName LIKE 'Sm%'`	✅ Yes	Leading characters known
`WHERE OrderDate >= '2025-01-01'`	✅ Yes	Range on indexed column
`WHERE YEAR(OrderDate) = 2025`	❌ No	Function on column — can't seek
`WHERE LastName LIKE '%mith'`	❌ No	Leading wildcard — must scan
`WHERE Price * 1.1 > 100`	❌ No	Expression on column
`WHERE ISNULL(City, 'N/A') = 'Cairo'`	❌ No	Function wrapping column
`WHERE CAST(OrderDate AS DATE) = '2025-01-01'`	❌ No	CAST on column

How to Fix Non-SARGable Queries

Non-SARGable	SARGable Rewrite
`WHERE YEAR(OrderDate) = 2025`	`WHERE OrderDate >= '2025-01-01' AND OrderDate < '2026-01-01'`
`WHERE LastName LIKE '%mith'`	Use Full-Text Search or reverse the string
`WHERE Price * 1.1 > 100`	`WHERE Price > 100 / 1.1`
`WHERE ISNULL(City, 'N/A') = 'Cairo'`	`WHERE City = 'Cairo'`

🎯 Exam Focus

Golden rule: Never apply a function/expression to the LEFT side of a WHERE clause on an indexed column. Move the function to the RIGHT side (constant side). This is heavily tested on DP-300.

🏢 Real-World DBA Note

Oracle DBA parallel: SARGability in SQL Server = Oracle's ability to use "index range scan" vs "full table scan". The same rule applies in Oracle — WHERE TRUNC(date_col) = DATE won't use the index. The fix is the same: use range predicates instead of functions on columns.

Index Design Strategy

The Decision Checklist

Index Design Checklist

🎯

Equality Columns First

Put most selective equality columns (=) first in composite index key

📈

Range Columns Last

Range predicates (>, <, BETWEEN) go after equality columns in the key

📋

INCLUDE for SELECT

Add columns from SELECT list as INCLUDE to avoid Key Lookups

🗑️

Drop Overlapping

IX(A,B) is redundant if IX(A,B,C) exists. Drop the shorter one.

Column Order Matters!

The order of columns in a composite index is critical:

Index: IX_Orders (CustomerID, OrderDate, Status)

Query	Uses Index?	Why
`WHERE CustomerID = 5`	✅ Seek	Leftmost column
`WHERE CustomerID = 5 AND OrderDate > '2025-01-01'`	✅ Seek	Left-to-right prefix
`WHERE OrderDate > '2025-01-01'`	❌ Scan	Skips leftmost column
`WHERE Status = 'Active'`	❌ Scan	Skips first 2 columns
`WHERE CustomerID = 5 AND Status = 'Active'`	✅ Partial seek	Seeks on CustomerID, scans Status

🎯 Exam Focus

Put the most selective (equality) columns FIRST in a composite index. Range columns go last. Example: INDEX (CustomerID, Status, OrderDate) — equality, equality, range.

Index Anti-Patterns

Anti-Pattern	Problem	Fix
Too many indexes	Slows INSERT/UPDATE/DELETE (each index must be updated)	Drop unused indexes (check `sys.dm_db_index_usage_stats`)
Overlapping indexes	`IX1(A, B)` and `IX2(A, B, C)` — IX1 is redundant	Drop the shorter one
Wide indexes (many columns in key)	Larger B-tree, slower maintenance	Move non-key columns to INCLUDE
Index on low-cardinality column	`IX(Gender)` — only 2 values, scan anyway	Remove or combine with high-cardinality column
No index maintenance	Fragmented indexes = slow reads	Schedule REORGANIZE/REBUILD

Wait Statistics — Systematic Troubleshooting

Wait stats tell you what SQL Server is waiting for — the most powerful diagnostic tool.

Wait Statistics Troubleshooting

💾

PAGEIOLATCH

Waiting for data page from disk. Fix: add missing indexes, scale storage IOPS.

🔒

LCK_M_S / LCK_M_X

Waiting for row/page lock (blocking). Fix: shorten transactions, enable RCSI.

⚙️

CXPACKET

Parallelism synchronization overhead. Fix: lower MAXDOP, update statistics.

💻

SOS_SCHEDULER_YIELD

CPU-bound query yielding scheduler. Fix: optimize query, scale compute.

📝

WRITELOG

Waiting for log flush to disk. Fix: scale to higher tier for more log throughput.

🧠

RESOURCE_SEMAPHORE

Waiting for memory grant. Fix: reduce query memory needs, scale up RAM.

Top Wait Types Cheat Sheet

Wait Type	Means	Oracle Equivalent	Fix
PAGEIOLATCH_SH/EX	Waiting for data page from disk	`db file sequential read`	Add missing indexes, scale storage IOPS
LCK_M_S / LCK_M_X	Waiting for lock (blocking)	`enq: TX - row lock contention`	Fix long transactions, use RCSI
CXPACKET	Parallelism overhead	`PX Deq: Execute Reply`	Lower MAXDOP, update stats
SOS_SCHEDULER_YIELD	CPU-bound query	`CPU used by this session`	Optimize query, scale compute
WRITELOG	Waiting for log flush	`log file sync`	Scale to higher tier (more log throughput)
RESOURCE_SEMAPHORE	Waiting for memory grant	`PGA memory`	Reduce query memory needs, scale up
THREADPOOL	No available worker threads	Session limit hit	Reduce connections, connection pooling
IO_COMPLETION	Waiting for I/O	`direct path read/write`	Scale storage, reduce I/O

🎯 Exam Focus

Exam pattern: "Database is slow, top wait is X. What should you do?" → Map the wait type to the fix. PAGEIOLATCH = I/O/indexes. LCK = blocking. CXPACKET = MAXDOP. SOS_SCHEDULER = CPU. WRITELOG = log throughput.

Blocking & Deadlocks

Blocking Chain

🔒

Session A Holds Lock

BEGIN TRAN + UPDATE on Row 1

Exclusive lock held, transaction not committed

⏳

Session B Waits

SELECT on same Row 1

Blocked by Session A's exclusive lock (LCK_M_S wait)

⏳

Session C Waits

Also needs Row 1

Blocked by Session A → chain grows

✅

Fix: RCSI or Short Txns

RCSI: readers get versioned snapshot, no blocking

Keep transactions short, commit quickly

How to investigate:

sys.dm_exec_requests → blocking_session_id column shows who's blocking whom
sys.dm_tran_locks → shows what locks are held and requested
Read Committed Snapshot Isolation (RCSI) = the best fix for most blocking scenarios

RCSI — The Blocking Eliminator

Isolation Level	Readers Block Writers?	Writers Block Readers?	Row Versioning?
Read Committed (default)	❌	✅ Yes (readers wait)	❌
Read Committed Snapshot (RCSI)	❌	❌ No blocking!	✅ (tempdb)
Snapshot	❌	❌	✅ (tempdb)

🎯 Exam Focus

RCSI is enabled by default in Azure SQL DB and MI. On SQL Server VMs, you must enable it: ALTER DATABASE mydb SET READ_COMMITTED_SNAPSHOT ON. This is a major exam topic — Azure SQL = no reader/writer blocking by default.

Deadlocks

Deadlock — Circular Wait

🔄

Session A → B

Session A holds lock on Resource 1, waits for Resource 2 (held by B)

🔄

Session B → A

Session B holds lock on Resource 2, waits for Resource 1 (held by A). SQL kills the cheaper victim.

Deadlock resolution strategy:

Check system_health XEvent session (always captures deadlock graphs)
Analyze the deadlock graph — identify the two resources and sessions
Fix by: accessing tables in consistent order, keeping transactions short, adding appropriate indexes

TempDB Optimization

TempDB is used for sorts, hash joins, version store (RCSI), temp tables, and spills.

TempDB Contention Pattern

TempDB Contention

⚠️

PAGELATCH Waits

Contention on PFS/GAM/SGAM allocation pages when many sessions create temp objects

📁

Multiple Data Files

Add 1 file per CPU core (up to 8). All same size for even round-robin allocation.

☁️

Azure SQL DB/MI

Microsoft manages TempDB config automatically. On VMs: YOU must configure this.

TempDB best practices:

Multiple data files — 1 per CPU core (up to 8 files for most)
All files same size (ensures even round-robin allocation)
Pre-size files — don't rely on autogrow
In Azure SQL DB/MI: Microsoft manages this for you
On SQL VMs: you must configure this manually

🎯 Exam Focus

If the exam mentions PAGELATCH waits in tempdb → add more tempdb data files. If it mentions tempdb filling up → check for long-running transactions (version store), large sorts (missing indexes), or explicit temp table abuse.

Parameter Sniffing — Detection & Solutions

Parameter Sniffing

📝

First Execution

@Country = 'Luxembourg' (10 rows)

Optimizer creates plan optimized for small result set

📦

Plan Cached

Nested Loop plan cached for reuse

Works great for small datasets

💥

Second Execution

@Country = 'USA' (5M rows)

Reuses small-data plan → catastrophically slow

✅

Fix Options

RECOMPILE, OPTIMIZE FOR UNKNOWN

Parameter Sensitive Plan (compat 160)

Solutions (from least to most aggressive)

Solution	How	When
Update Statistics	`UPDATE STATISTICS table`	Stale stats causing bad estimates
OPTIMIZE FOR UNKNOWN	Query hint	Removes sniffing — uses average stats
RECOMPILE	`OPTION(RECOMPILE)` or sp-level	Always gets fresh plan (CPU cost)
Plan Guides	Force a specific plan	Emergency fix for vendor code you can't change
Parameter Sensitive Plan (IQP)	Compat level 160	Best: automatic multiple plan variants

Database-Scoped Configuration

Important settings that control query behavior per database.

Setting	What It Does	Recommended
MAXDOP	Max parallelism per query	8 or less (depends on cores)
LEGACY_CARDINALITY_ESTIMATION	Use old CE model	OFF (unless regressions)
OPTIMIZE_FOR_AD_HOC_WORKLOADS	Cache only plan stubs for one-time queries	ON (saves plan cache memory)
QUERY_OPTIMIZER_HOTFIXES	Enable latest optimizer fixes	ON
LAST_QUERY_PLAN_STATS	Capture last actual plan	ON (enables `sys.dm_exec_query_plan_stats`)

🎯 Exam Focus

OPTIMIZE_FOR_AD_HOC_WORKLOADS = ON prevents single-use query plans from bloating the plan cache. Instead of caching the full plan on first execution, it caches a small "stub." Full plan is only cached on second execution. Exam-tested!

DBCC CHECKDB — Database Integrity

Verifies database integrity — catches corruption before it causes data loss.

Command	What It Checks	Impact
CHECKDB	Everything (allocation, structural, logical)	Heavy — schedule off-peak
CHECKTABLE	Single table	Medium — targeted fix
CHECKALLOC	Allocation consistency	Light
CHECKFILEGROUP	One filegroup	Medium

Strategy: Run CHECKDB weekly on SQL VMs (Azure SQL DB/MI does this automatically). Check msdb.dbo.suspect_pages for known corruption.

🎯 Exam Focus

In Azure SQL DB and MI, Microsoft runs integrity checks automatically — you don't need to schedule DBCC CHECKDB. On SQL VMs, YOU must schedule it (via SQL Agent). This is an exam trap — knowing who's responsible for what.

Anti-Patterns

"Add an index for every slow query." Indexes cost writes + storage. A 30-column wide non-clustered index on a write-heavy OLTP table can collapse insert performance. Audit sys.dm_db_index_usage_stats quarterly and drop unused indexes.
"sys.dm_db_missing_index_* recommendations — just create them all." They're suggestions from individual query plans, not a holistic view. Two recommendations may overlap; create the union, not both. Pair with Database Tuning Advisor or analysis of QS top queries.
"Wrap WHERE in a function for readability." WHERE YEAR(order_date) = 2026 is non-SARGable — forces a scan. Rewrite to WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01' to enable a seek.
"OPTION (RECOMPILE) everywhere parameter sniffing bites us." Eliminates plan caching for that statement. CPU cost adds up. Use Optimize for Ad Hoc Workloads at server level + Query Store forced plans for the worst offenders.
"Drop the clustered index because the query plan goes faster without it." Heap tables grow fragmented and forwarded_record_count increases page reads invisibly. Almost always wrong outside of staging tables.
"Update statistics nightly = fine." Auto-update kicks in at ~20 % row change. For volatile tables that's every few hours. Add explicit UPDATE STATISTICS WITH FULLSCAN after large loads, not just nightly.

⚠️ Watch Out

Filtered indexes on Azure SQL DB don't always satisfy queries with parameters. The optimizer needs the predicate to be a constant match. Parameterized queries may bypass the filtered index unless OPTION (RECOMPILE) is added. Test before relying on it.

Migration Between Tuning Approaches

From → To	Path	Cost
Heap → Clustered Index	`CREATE CLUSTERED INDEX` (online, Enterprise/MI/Azure SQL)	Online; gain seek + reduce forwarded records
Wide non-clustered index → Covering index with INCLUDE	Recreate with leaf-level INCLUDE	Same key, fewer key-lookup operators
Rowstore → Columnstore (analytics)	`CREATE CLUSTERED COLUMNSTORE INDEX`	Drops rowstore CI; analytical query gains; OLTP loses
Auto stats only → Manual `UPDATE STATISTICS WITH FULLSCAN`	Schedule via SQL Agent / Elastic Job	More CPU during run; better cardinality estimates
Plan cache reliance → Query Store + forced plans	Enable QS, force a known-good plan	Survives recompiles; small QS storage cost
Manual missing-index review → Database Watcher recommendations	Watcher highlights aggregated `dm_db_missing_index_*`	Cross-server view; ADX cost
`sp_BlitzIndex`-style scripts → Database Tuning Advisor	DTA for workload-driven recs	Heavy run; plan it for off-hours
Manual `DBCC CHECKDB` on SQL VM → SQL Agent job	Schedule + send results to Log Analytics	Detection automated; alerting cost = email/AG

Most expensive moves: rowstore → columnstore (architecture change) and dropping wrong-index sprawl (regression risk).

Real Scenarios

OLTP order table ‘slow after release’ → Query Store ‘Regressed Queries’ → force prior plan + investigate parameter sniffing. Driver: rollback at the plan layer. Trade-off: temporary; investigate root cause same week.
Reporting query reads 4 indexes redundantly → Build one covering index with INCLUDE columns. Driver: collapse 4 index seeks + key lookup into 1 seek. Trade-off: write cost of one wider index.
Star-schema fact table for analytics → Clustered Columnstore Index + partition by date. Driver: 10–20× compression + segment elimination. Trade-off: small singleton inserts go via deltastore (poor performance) — batch loads only.
Auto-stats lagging on volatile load → Post-ETL UPDATE STATISTICS ... WITH FULLSCAN on the loaded table. Driver: optimizer needs fresh cardinality. Trade-off: CPU spike during update window.
Estate-wide missing-index review → Database Watcher to ADX Workbook surfacing top dm_db_missing_index_* candidates per DB, ranked by improvement_measure. Driver: rank by impact, not by row count. Trade-off: review cost — don't auto-create.

Flashcards

What is a Key Lookup and how do you fix it?

Click to reveal answer

When a non-clustered index find the row but must go back to the clustered index for additional columns. Fix: add missing columns to INCLUDE in the non-clustered index (creating a 'covering' index).

1 / 10

Quiz

Q1/6

0 correct

A query on a 10M row table runs slowly. The execution plan shows an Index Seek on CustomerID followed by 50,000 Key Lookups. What's the best fix?

How SQL Server Finds Data — The Foundation​

Index Architecture — B-Tree Structure​

The Key Lookup Problem​

SARGable vs Non-SARGable Queries​

How to Fix Non-SARGable Queries​

Index Design Strategy​

The Decision Checklist​

Column Order Matters!​

Index Anti-Patterns​

Wait Statistics — Systematic Troubleshooting​

Top Wait Types Cheat Sheet​

Blocking & Deadlocks​

Blocking Chain​

RCSI — The Blocking Eliminator​

Deadlocks​

TempDB Optimization​

TempDB Contention Pattern​

Parameter Sniffing — Detection & Solutions​

Solutions (from least to most aggressive)​

Database-Scoped Configuration​

DBCC CHECKDB — Database Integrity​

Anti-Patterns​

Migration Between Tuning Approaches​

Real Scenarios​

Flashcards​

Quiz​

How SQL Server Finds Data — The Foundation

Index Architecture — B-Tree Structure

The Key Lookup Problem

SARGable vs Non-SARGable Queries

How to Fix Non-SARGable Queries

Index Design Strategy

The Decision Checklist

Column Order Matters!

Index Anti-Patterns

Wait Statistics — Systematic Troubleshooting

Top Wait Types Cheat Sheet

Blocking & Deadlocks

Blocking Chain

RCSI — The Blocking Eliminator

Deadlocks

TempDB Optimization

TempDB Contention Pattern

Parameter Sniffing — Detection & Solutions

Solutions (from least to most aggressive)

Database-Scoped Configuration

DBCC CHECKDB — Database Integrity

Anti-Patterns

Migration Between Tuning Approaches

Real Scenarios

Flashcards

Quiz