Skip to main content

Index Design & Query Tuning Deep Dive

How SQL Server Finds Data β€” The Foundation​

Data Access Methods
🐒
Table Scan
Reads every page. O(n) cost. No useful index exists for the query predicate.
🎯
Index Seek
B-tree navigation: 3-4 page reads. O(log n). Index matches the WHERE clause.
βœ…
Covering Index
Seek + all columns in index (INCLUDE). Zero Key Lookups. Best possible performance.

Index Architecture β€” B-Tree Structure​

B-Tree Index Structure
πŸ›οΈ
Root Node
Entry point. Contains key ranges pointing to intermediate pages. Single page.
πŸ“Ž
Intermediate Levels
Navigate key ranges to narrow down to the correct leaf page. 1-2 levels typical.
πŸƒ
Leaf Level
Contains actual data (clustered) or row pointers (non-clustered). Doubly-linked list for range scans.

Key insight: B-tree depth is typically 3-4 levels even for millions of rows. An Index Seek reads 3-4 pages (microseconds) vs a Table Scan reading thousands of pages.

The Key Lookup Problem​

The #1 performance killer after missing indexes.

The Key Lookup Problem
πŸ”
Index Seek on City
NC index on City finds 500 matching rows
Fast: only 3-4 page reads
⚠️
500 Key Lookups
Name and Email not in index
Each row requires a random I/O back to clustered index
βœ…
Fix: INCLUDE Columns
CREATE INDEX IX_City ON Customers(City) INCLUDE (Name, Email)
Covering index eliminates all lookups

The problem: Index on City finds 500 rows quickly, but Name and Email aren't in the index. SQL Server must do 500 individual lookups back to the clustered index β€” each one is a random I/O.

The solution: Add INCLUDE columns to the index:

-- Before: Key Lookup for every row
CREATE INDEX IX_City ON Customers(City);

-- After: Covering index β€” no Key Lookup needed
CREATE INDEX IX_City ON Customers(City) INCLUDE (Name, Email);
🎯 Exam Focus

INCLUDE columns are stored ONLY at the leaf level of the index (not in the B-tree navigation structure). They make the index "cover" the query without making the B-tree wider. If you see Key Lookups in an execution plan β†’ add the missing columns to INCLUDE.

SARGable vs Non-SARGable Queries​

SARGable (Search ARGument able) = the query can use an index seek. Non-SARGable = forced to scan.

PatternSARGable?Why
WHERE LastName = 'Smith'βœ… YesDirect equality match
WHERE LastName LIKE 'Sm%'βœ… YesLeading characters known
WHERE OrderDate >= '2025-01-01'βœ… YesRange on indexed column
WHERE YEAR(OrderDate) = 2025❌ NoFunction on column β€” can't seek
WHERE LastName LIKE '%mith'❌ NoLeading wildcard β€” must scan
WHERE Price * 1.1 > 100❌ NoExpression on column
WHERE ISNULL(City, 'N/A') = 'Cairo'❌ NoFunction wrapping column
WHERE CAST(OrderDate AS DATE) = '2025-01-01'❌ NoCAST on column

How to Fix Non-SARGable Queries​

Non-SARGableSARGable Rewrite
WHERE YEAR(OrderDate) = 2025WHERE OrderDate >= '2025-01-01' AND OrderDate < '2026-01-01'
WHERE LastName LIKE '%mith'Use Full-Text Search or reverse the string
WHERE Price * 1.1 > 100WHERE Price > 100 / 1.1
WHERE ISNULL(City, 'N/A') = 'Cairo'WHERE City = 'Cairo'
🎯 Exam Focus

Golden rule: Never apply a function/expression to the LEFT side of a WHERE clause on an indexed column. Move the function to the RIGHT side (constant side). This is heavily tested on DP-300.

🏒 Real-World DBA Note

Oracle DBA parallel: SARGability in SQL Server = Oracle's ability to use "index range scan" vs "full table scan". The same rule applies in Oracle β€” WHERE TRUNC(date_col) = DATE won't use the index. The fix is the same: use range predicates instead of functions on columns.

Index Design Strategy​

The Decision Checklist​

Index Design Checklist
🎯
Equality Columns First
Put most selective equality columns (=) first in composite index key
πŸ“ˆ
Range Columns Last
Range predicates (>, <, BETWEEN) go after equality columns in the key
πŸ“‹
INCLUDE for SELECT
Add columns from SELECT list as INCLUDE to avoid Key Lookups
πŸ—‘οΈ
Drop Overlapping
IX(A,B) is redundant if IX(A,B,C) exists. Drop the shorter one.

Column Order Matters!​

The order of columns in a composite index is critical:

Index: IX_Orders (CustomerID, OrderDate, Status)
QueryUses Index?Why
WHERE CustomerID = 5βœ… SeekLeftmost column
WHERE CustomerID = 5 AND OrderDate > '2025-01-01'βœ… SeekLeft-to-right prefix
WHERE OrderDate > '2025-01-01'❌ ScanSkips leftmost column
WHERE Status = 'Active'❌ ScanSkips first 2 columns
WHERE CustomerID = 5 AND Status = 'Active'βœ… Partial seekSeeks on CustomerID, scans Status
🎯 Exam Focus

Put the most selective (equality) columns FIRST in a composite index. Range columns go last. Example: INDEX (CustomerID, Status, OrderDate) β€” equality, equality, range.

Index Anti-Patterns​

Anti-PatternProblemFix
Too many indexesSlows INSERT/UPDATE/DELETE (each index must be updated)Drop unused indexes (check sys.dm_db_index_usage_stats)
Overlapping indexesIX1(A, B) and IX2(A, B, C) β€” IX1 is redundantDrop the shorter one
Wide indexes (many columns in key)Larger B-tree, slower maintenanceMove non-key columns to INCLUDE
Index on low-cardinality columnIX(Gender) β€” only 2 values, scan anywayRemove or combine with high-cardinality column
No index maintenanceFragmented indexes = slow readsSchedule REORGANIZE/REBUILD

Wait Statistics β€” Systematic Troubleshooting​

Wait stats tell you what SQL Server is waiting for β€” the most powerful diagnostic tool.

Wait Statistics Troubleshooting
πŸ’Ύ
PAGEIOLATCH
Waiting for data page from disk. Fix: add missing indexes, scale storage IOPS.
πŸ”’
LCK_M_S / LCK_M_X
Waiting for row/page lock (blocking). Fix: shorten transactions, enable RCSI.
βš™οΈ
CXPACKET
Parallelism synchronization overhead. Fix: lower MAXDOP, update statistics.
πŸ’»
SOS_SCHEDULER_YIELD
CPU-bound query yielding scheduler. Fix: optimize query, scale compute.
πŸ“
WRITELOG
Waiting for log flush to disk. Fix: scale to higher tier for more log throughput.
🧠
RESOURCE_SEMAPHORE
Waiting for memory grant. Fix: reduce query memory needs, scale up RAM.

Top Wait Types Cheat Sheet​

Wait TypeMeansOracle EquivalentFix
PAGEIOLATCH_SH/EXWaiting for data page from diskdb file sequential readAdd missing indexes, scale storage IOPS
LCK_M_S / LCK_M_XWaiting for lock (blocking)enq: TX - row lock contentionFix long transactions, use RCSI
CXPACKETParallelism overheadPX Deq: Execute ReplyLower MAXDOP, update stats
SOS_SCHEDULER_YIELDCPU-bound queryCPU used by this sessionOptimize query, scale compute
WRITELOGWaiting for log flushlog file syncScale to higher tier (more log throughput)
RESOURCE_SEMAPHOREWaiting for memory grantPGA memoryReduce query memory needs, scale up
THREADPOOLNo available worker threadsSession limit hitReduce connections, connection pooling
IO_COMPLETIONWaiting for I/Odirect path read/writeScale storage, reduce I/O
🎯 Exam Focus

Exam pattern: "Database is slow, top wait is X. What should you do?" β†’ Map the wait type to the fix. PAGEIOLATCH = I/O/indexes. LCK = blocking. CXPACKET = MAXDOP. SOS_SCHEDULER = CPU. WRITELOG = log throughput.

Blocking & Deadlocks​

Blocking Chain​

Blocking Chain
πŸ”’
Session A Holds Lock
BEGIN TRAN + UPDATE on Row 1
Exclusive lock held, transaction not committed
⏳
Session B Waits
SELECT on same Row 1
Blocked by Session A's exclusive lock (LCK_M_S wait)
⏳
Session C Waits
Also needs Row 1
Blocked by Session A β†’ chain grows
βœ…
Fix: RCSI or Short Txns
RCSI: readers get versioned snapshot, no blocking
Keep transactions short, commit quickly

How to investigate:

  • sys.dm_exec_requests β†’ blocking_session_id column shows who's blocking whom
  • sys.dm_tran_locks β†’ shows what locks are held and requested
  • Read Committed Snapshot Isolation (RCSI) = the best fix for most blocking scenarios

RCSI β€” The Blocking Eliminator​

Isolation LevelReaders Block Writers?Writers Block Readers?Row Versioning?
Read Committed (default)βŒβœ… Yes (readers wait)❌
Read Committed Snapshot (RCSI)❌❌ No blocking!βœ… (tempdb)
SnapshotβŒβŒβœ… (tempdb)
🎯 Exam Focus

RCSI is enabled by default in Azure SQL DB and MI. On SQL Server VMs, you must enable it: ALTER DATABASE mydb SET READ_COMMITTED_SNAPSHOT ON. This is a major exam topic β€” Azure SQL = no reader/writer blocking by default.

Deadlocks​

Deadlock β€” Circular Wait
πŸ”„
Session A β†’ B
Session A holds lock on Resource 1, waits for Resource 2 (held by B)
πŸ”„
Session B β†’ A
Session B holds lock on Resource 2, waits for Resource 1 (held by A). SQL kills the cheaper victim.

Deadlock resolution strategy:

  1. Check system_health XEvent session (always captures deadlock graphs)
  2. Analyze the deadlock graph β€” identify the two resources and sessions
  3. Fix by: accessing tables in consistent order, keeping transactions short, adding appropriate indexes

TempDB Optimization​

TempDB is used for sorts, hash joins, version store (RCSI), temp tables, and spills.

TempDB Contention Pattern​

TempDB Contention
⚠️
PAGELATCH Waits
Contention on PFS/GAM/SGAM allocation pages when many sessions create temp objects
πŸ“
Multiple Data Files
Add 1 file per CPU core (up to 8). All same size for even round-robin allocation.
☁️
Azure SQL DB/MI
Microsoft manages TempDB config automatically. On VMs: YOU must configure this.

TempDB best practices:

  • Multiple data files β€” 1 per CPU core (up to 8 files for most)
  • All files same size (ensures even round-robin allocation)
  • Pre-size files β€” don't rely on autogrow
  • In Azure SQL DB/MI: Microsoft manages this for you
  • On SQL VMs: you must configure this manually
🎯 Exam Focus

If the exam mentions PAGELATCH waits in tempdb β†’ add more tempdb data files. If it mentions tempdb filling up β†’ check for long-running transactions (version store), large sorts (missing indexes), or explicit temp table abuse.

Parameter Sniffing β€” Detection & Solutions​

Parameter Sniffing
πŸ“
First Execution
@Country = 'Luxembourg' (10 rows)
Optimizer creates plan optimized for small result set
πŸ“¦
Plan Cached
Nested Loop plan cached for reuse
Works great for small datasets
πŸ’₯
Second Execution
@Country = 'USA' (5M rows)
Reuses small-data plan β†’ catastrophically slow
βœ…
Fix Options
RECOMPILE, OPTIMIZE FOR UNKNOWN
Parameter Sensitive Plan (compat 160)

Solutions (from least to most aggressive)​

SolutionHowWhen
Update StatisticsUPDATE STATISTICS tableStale stats causing bad estimates
OPTIMIZE FOR UNKNOWNQuery hintRemoves sniffing β€” uses average stats
RECOMPILEOPTION(RECOMPILE) or sp-levelAlways gets fresh plan (CPU cost)
Plan GuidesForce a specific planEmergency fix for vendor code you can't change
Parameter Sensitive Plan (IQP)Compat level 160Best: automatic multiple plan variants

Database-Scoped Configuration​

Important settings that control query behavior per database.

SettingWhat It DoesRecommended
MAXDOPMax parallelism per query8 or less (depends on cores)
LEGACY_CARDINALITY_ESTIMATIONUse old CE modelOFF (unless regressions)
OPTIMIZE_FOR_AD_HOC_WORKLOADSCache only plan stubs for one-time queriesON (saves plan cache memory)
QUERY_OPTIMIZER_HOTFIXESEnable latest optimizer fixesON
LAST_QUERY_PLAN_STATSCapture last actual planON (enables sys.dm_exec_query_plan_stats)
🎯 Exam Focus

OPTIMIZE_FOR_AD_HOC_WORKLOADS = ON prevents single-use query plans from bloating the plan cache. Instead of caching the full plan on first execution, it caches a small "stub." Full plan is only cached on second execution. Exam-tested!

DBCC CHECKDB β€” Database Integrity​

Verifies database integrity β€” catches corruption before it causes data loss.

CommandWhat It ChecksImpact
CHECKDBEverything (allocation, structural, logical)Heavy β€” schedule off-peak
CHECKTABLESingle tableMedium β€” targeted fix
CHECKALLOCAllocation consistencyLight
CHECKFILEGROUPOne filegroupMedium

Strategy: Run CHECKDB weekly on SQL VMs (Azure SQL DB/MI does this automatically). Check msdb.dbo.suspect_pages for known corruption.

🎯 Exam Focus

In Azure SQL DB and MI, Microsoft runs integrity checks automatically β€” you don't need to schedule DBCC CHECKDB. On SQL VMs, YOU must schedule it (via SQL Agent). This is an exam trap β€” knowing who's responsible for what.


Anti-Patterns​

  • "Add an index for every slow query." Indexes cost writes + storage. A 30-column wide non-clustered index on a write-heavy OLTP table can collapse insert performance. Audit sys.dm_db_index_usage_stats quarterly and drop unused indexes.
  • "sys.dm_db_missing_index_* recommendations β€” just create them all." They're suggestions from individual query plans, not a holistic view. Two recommendations may overlap; create the union, not both. Pair with Database Tuning Advisor or analysis of QS top queries.
  • "Wrap WHERE in a function for readability." WHERE YEAR(order_date) = 2026 is non-SARGable β€” forces a scan. Rewrite to WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01' to enable a seek.
  • "OPTION (RECOMPILE) everywhere parameter sniffing bites us." Eliminates plan caching for that statement. CPU cost adds up. Use Optimize for Ad Hoc Workloads at server level + Query Store forced plans for the worst offenders.
  • "Drop the clustered index because the query plan goes faster without it." Heap tables grow fragmented and forwarded_record_count increases page reads invisibly. Almost always wrong outside of staging tables.
  • "Update statistics nightly = fine." Auto-update kicks in at ~20 % row change. For volatile tables that's every few hours. Add explicit UPDATE STATISTICS WITH FULLSCAN after large loads, not just nightly.
⚠️ Watch Out

Filtered indexes on Azure SQL DB don't always satisfy queries with parameters. The optimizer needs the predicate to be a constant match. Parameterized queries may bypass the filtered index unless OPTION (RECOMPILE) is added. Test before relying on it.


Migration Between Tuning Approaches​

From β†’ ToPathCost
Heap β†’ Clustered IndexCREATE CLUSTERED INDEX (online, Enterprise/MI/Azure SQL)Online; gain seek + reduce forwarded records
Wide non-clustered index β†’ Covering index with INCLUDERecreate with leaf-level INCLUDESame key, fewer key-lookup operators
Rowstore β†’ Columnstore (analytics)CREATE CLUSTERED COLUMNSTORE INDEXDrops rowstore CI; analytical query gains; OLTP loses
Auto stats only β†’ Manual UPDATE STATISTICS WITH FULLSCANSchedule via SQL Agent / Elastic JobMore CPU during run; better cardinality estimates
Plan cache reliance β†’ Query Store + forced plansEnable QS, force a known-good planSurvives recompiles; small QS storage cost
Manual missing-index review β†’ Database Watcher recommendationsWatcher highlights aggregated dm_db_missing_index_*Cross-server view; ADX cost
sp_BlitzIndex-style scripts β†’ Database Tuning AdvisorDTA for workload-driven recsHeavy run; plan it for off-hours
Manual DBCC CHECKDB on SQL VM β†’ SQL Agent jobSchedule + send results to Log AnalyticsDetection automated; alerting cost = email/AG

Most expensive moves: rowstore β†’ columnstore (architecture change) and dropping wrong-index sprawl (regression risk).


Real Scenarios​

  1. OLTP order table β€˜slow after release’ β†’ Query Store β€˜Regressed Queries’ β†’ force prior plan + investigate parameter sniffing. Driver: rollback at the plan layer. Trade-off: temporary; investigate root cause same week.
  2. Reporting query reads 4 indexes redundantly β†’ Build one covering index with INCLUDE columns. Driver: collapse 4 index seeks + key lookup into 1 seek. Trade-off: write cost of one wider index.
  3. Star-schema fact table for analytics β†’ Clustered Columnstore Index + partition by date. Driver: 10–20Γ— compression + segment elimination. Trade-off: small singleton inserts go via deltastore (poor performance) β€” batch loads only.
  4. Auto-stats lagging on volatile load β†’ Post-ETL UPDATE STATISTICS ... WITH FULLSCAN on the loaded table. Driver: optimizer needs fresh cardinality. Trade-off: CPU spike during update window.
  5. Estate-wide missing-index review β†’ Database Watcher to ADX Workbook surfacing top dm_db_missing_index_* candidates per DB, ranked by improvement_measure. Driver: rank by impact, not by row count. Trade-off: review cost β€” don't auto-create.

Flashcards​

What is a Key Lookup and how do you fix it?
Click to reveal answer
When a non-clustered index find the row but must go back to the clustered index for additional columns. Fix: add missing columns to INCLUDE in the non-clustered index (creating a 'covering' index).
1 / 10

Quiz​

Q1/6
0 correct
A query on a 10M row table runs slowly. The execution plan shows an Index Seek on CustomerID followed by 50,000 Key Lookups. What's the best fix?