Skip to main content

Advanced Monitoring & Optimization

The Complete Monitoring Stack

Monitoring → Diagnosis → Action
🔍
1. Real-Time Monitoring
DMVs (live sessions, waits)
Extended Events (lightweight tracing)
Database Watcher (near real-time dashboards)
📊
2. Historical Analysis
Query Store (plan history, regressions)
SQL Insights (Azure Monitor deep telemetry)
Intelligent Insights (ML anomaly detection)
🔧
3. Performance Optimization
Automatic Tuning (force plans, auto-index)
IQP (adaptive joins, memory feedback)
Resource Governor (workload isolation)
Statistics Maintenance

Extended Events (XEvents)

The modern, lightweight replacement for SQL Trace/Profiler. Think of it as Oracle's event tracing (10046, 10053) but better.

Architecture

🎯
Events
What to capture
sql_statement_completed
xml_deadlock_report
error_reported
🔍
Predicates
Filters (when to capture)
duration > 5000ms
error_number = 18456
📎
Actions
Extra data to collect
sql_text
plan_handle
session_id
💾
Targets
Where to store
ring_buffer
event_file
histogram

Key Predefined Sessions

SessionWhat It CapturesEnabled By Default
system_healthErrors, deadlocks, memory issues✅ Yes
AlwaysOn_healthAG state changes, failover events✅ (if AG configured)

Common XEvent Use Cases

ScenarioEvents to Capture
Slow queriessql_statement_completed (filter: duration > 5s)
Deadlocksxml_deadlock_report (or check system_health)
Login failureserror_reported (error_number = 18456)
Plan changesquery_post_compilation_showplan
Wait analysiswait_info, wait_completed
🎯 Exam Focus

Extended Events replaces SQL Profiler (Profiler is deprecated). XEvents uses less CPU, supports filtering at the source (predicates), and works on Azure SQL DB, MI, and VMs. system_health session is always running — check it first for deadlocks and errors.

🏢 Real-World DBA Note

Oracle DBA parallel: Extended Events = Oracle's ALTER SESSION SET EVENTS + DBMS_MONITOR + AWR event tracing, but unified in one framework. The system_health session is like Oracle's incident capture — always running, catches critical events automatically.

Database Watcher (Preview)

Near real-time monitoring dashboard that doesn't require manual setup of diagnostic pipelines.

FeatureDetails
Data storeAzure Data Explorer (Kusto)
Query languageKQL
LatencyNear real-time (~seconds)
SupportedAzure SQL DB, MI
DashboardsPre-built Grafana dashboards
Key metricsSessions, waits, query stats, resource usage, blocking
🎯 Exam Focus

Database Watcher is Microsoft's newest monitoring solution — it streams data to Azure Data Explorer for KQL analysis. If the exam mentions "near real-time monitoring with KQL dashboards" → Database Watcher.

SQL Insights (Azure Monitor)

Deeper telemetry than basic Azure Monitor metrics. Uses a monitoring VM to collect DMV data.

🗄️
Azure SQL DB/MI/VM
Source database
DMV queries collected
📡
Monitoring VM (Telegraf)
Collects DMV data
Pushes to Log Analytics
📊
Log Analytics
Pre-built Workbooks
Top queries, waits, sessions
Azure Monitor Alerts

Resource Governor (VM & MI)

Isolates workloads by limiting CPU, memory, and I/O per workload group.

Resource Governor Architecture
🏊
Resource Pool: OLTP
CPU: 60%, Memory: 60% — Production workload
📊
Resource Pool: Reporting
CPU: 30%, Memory: 30% — BI queries
⚙️
Resource Pool: Default
CPU: 10%, Memory: 10% — Admin tasks

Architecture components:

  1. Resource Pool — defines CPU/memory/I/O limits
  2. Workload Group — maps to a resource pool with additional settings (request limits)
  3. Classifier Function — routes incoming sessions to workload groups based on rules
🎯 Exam Focus

Resource Governor is available on SQL Server on VM and MI (not Azure SQL DB). If the exam says "prevent reporting queries from consuming all CPU" → Resource Governor. For Azure SQL DB, use read replicas (BC tier) or Named Replicas (Hyperscale) to isolate workloads instead.

🏢 Real-World DBA Note

Oracle DBA parallel: Resource Governor = Oracle Resource Manager. Resource Pools = Consumer Groups. Classifier Function = Resource Manager plan rules. Same concept: prevent one workload from starving another.

Intelligent Query Processing (IQP)

A family of automatic query optimization features. No code changes needed — just upgrade compatibility level.

IQP Features (Compatibility Level 150+)
🧠
Memory Grant Feedback
Auto-adjust memory grants over multiple executions
🔀
Adaptive Joins
Hash or Nested Loop decided at runtime based on actual rows
Batch Mode on Rowstore
Analytics speed on OLTP tables without columnstore
📐
Approximate Count
APPROX_COUNT_DISTINCT — fast, ~2% error
📋
Table Variable Deferred
Accurate cardinality for table variables
📌
Optimized Plan Forcing
Faster forced plan compilation via Query Store
🎯
Parameter Sensitive Plan
Multiple plans for parameter sniffing (Compat 160)
🔄
DOP Feedback
Auto-adjust parallelism degree (Compat 160)
📈
CE Feedback
Auto-adjust cardinality estimates (Compat 160)

Key exam-tested IQP features:

FeatureProblem It SolvesCompat Level
Memory Grant FeedbackQueries request too much/too little memory (spills to disk)150
Adaptive JoinsOptimizer picks wrong join type at compile time150
Batch Mode on RowstoreAnalytics queries slow on OLTP tables150
Parameter Sensitive PlanParameter sniffing → bad plan for some values160
DOP FeedbackOver/under-parallelism wastes resources160
🎯 Exam Focus

IQP requires the right compatibility level (150 = SQL 2019, 160 = SQL 2022). If the exam asks "how to enable adaptive joins" → set compatibility level to 150+. No query changes needed — it's automatic.

Statistics Maintenance

Statistics tell the query optimizer about data distribution. Stale statistics → bad plans → poor performance.

SettingBehaviorBest For
AUTO_UPDATE_STATISTICSUpdates when ~20% of rows changeMost workloads (default: ON)
AUTO_UPDATE_STATISTICS_ASYNCUpdates in background (doesn't block queries)OLTP (avoids query compilation delays)
Manual UPDATE STATISTICSOn-demand, full scan or sample %After bulk loads, partition switches
🎯 Exam Focus

After large data loads: always run UPDATE STATISTICS manually. Auto-update won't trigger until the 20% threshold. For partitioned tables, update stats on specific partitions with UPDATE STATISTICS ... WITH RESAMPLE ON PARTITIONS(N).

Execution Plans — What to Look For

Red Flag in PlanWhat It MeansFix
Table/Index Scan (on large table)Missing index or unSARGable WHERECreate covering index
Key Lookup (high cost)Index doesn't include needed columnsAdd INCLUDE columns
Sort (with warning ⚠️)Memory spill to tempdbAdd index matching ORDER BY, or increase memory
Parallelism (CXPACKET waits)Skewed distribution across threadsUpdate stats, limit MAXDOP
Thick arrows (many rows)Cardinality estimate wrongUpdate stats, rewrite query
Nested Loops (on large sets)Should be Hash MatchUpdate stats, hint if needed
Good Plan
Index Seek → Nested Loop (small inner) → SELECT
Fast, minimal I/O, targeted reads
Bad Plan
Table Scan → Hash Match (high memory) → Sort ⚠️ (spill) → SELECT
Slow, excessive I/O, memory spills
🏢 Real-World DBA Note

Oracle DBA parallel: Execution plans in SQL Server are like Oracle explain plans. Key difference: SQL Server shows actual rows vs estimated in actual plans (like Oracle's V$SQL_PLAN_STATISTICS_ALL). Look for big gaps between estimated and actual — that means stale stats.


Flashcards

What are Extended Events and why use them?
Click to reveal answer
Lightweight event tracing system replacing SQL Profiler. Uses Events + Predicates (filters) + Actions (extra data) + Targets (storage). Lower overhead than Profiler.
1 / 8

Quiz

Q1/4
0 correct
You need to isolate reporting queries so they don't consume more than 30% CPU on a SQL Managed Instance. What feature do you use?