Monitoring Tools Matrix

Too many monitoring tools, not sure which to use? This page cuts through the confusion — one place to understand what each tool does, when to use it, and how they relate.

The Matrix

Tool	Best For	SQL DB	MI	VM	Complexity	Exam Weight
Query Store	Query plan history, regression detection	✅	✅	✅	Low	🎯 High
DMVs	Real-time diagnostics (waits, sessions, index stats)	✅	✅	✅	Medium	🎯 High
Azure Monitor Metrics	Platform-level health (CPU%, DTU%, IO%)	✅	✅	Partial	Low	🎯 High
Azure Monitor Logs (KQL)	Deep log analysis, custom queries	✅	✅	✅	Medium	📖 Medium
Extended Events	Lightweight targeted tracing	✅	✅	✅	High	📖 Medium
Intelligent Insights	ML-based anomaly detection	✅	✅	❌	Low	📖 Medium
Query Performance Insight	Portal-based top query view	✅	❌	❌	Low	📖 Low
SQL Insights (Azure Monitor)	Centralized fleet monitoring	✅	✅	✅	Medium	📚 Low
Database Watcher	Real-time Azure SQL estate monitoring	✅	✅	❌	Low	📚 New
SQL Assessment	Best practice configuration checks	❌	❌	✅	Low	📖 Medium
Activity Log	Control-plane events (scale, failover, config)	✅	✅	✅	Low	🎯 Medium

🎯 Exam Focus

DP-300 focuses on: Query Store (plan regression, forced plans), DMVs (wait stats, missing indexes), Azure Monitor Metrics (alerting), and Activity Log (failover detection). Know these 4 well.

"If You Suspect X, Start With Y"

Symptom	Start With	Then Go Deeper
Slow queries	Query Store → Regressed Queries view	DMVs: `sys.dm_exec_query_stats`
High CPU	Azure Monitor: `cpu_percent` metric	DMVs: `sys.dm_exec_requests` → find top CPU queries
Blocking / deadlocks	DMVs: `sys.dm_exec_requests` + `sys.dm_tran_locks`	Extended Events: `blocked_process_report`
I/O bottleneck	Azure Monitor: `physical_data_read_percent`	DMVs: `sys.dm_io_virtual_file_stats`
Missing indexes	DMVs: `sys.dm_db_missing_index_*`	Query Store: find queries that would benefit
Plan regression	Query Store → "Regressed Queries"	Automatic Tuning: `FORCE_LAST_GOOD_PLAN`
Connection failures	Azure Monitor: `connection_failed` metric	Activity Log: check for failover events
Throttling (log rate)	Azure Monitor: `log_write_percent` = 100%	Scale up vCores or move to BC tier
Unexpected failover	Activity Log: filter by failover events	Azure Monitor: check resource health
Memory pressure	DMVs: `sys.dm_os_memory_clerks`	Query Store: find memory-heavy queries
tempdb contention	DMVs: `sys.dm_exec_requests` (wait type = `PAGELATCH`)	Add tempdb files (VM only)
Index fragmentation	DMVs: `sys.dm_db_index_physical_stats`	Schedule rebuild/reorganize maintenance

Query Store — Quick Decision Guide

Query Store is your flight recorder — it captures every query's plan and performance over time.

Question	Answer
Is it enabled by default?	✅ Yes on Azure SQL DB. ❌ Manual on VM. ✅ Yes on MI.
Where is data stored?	In the user database (not system DB)
Can I force a specific plan?	✅ Yes — like Oracle SQL Profiles
Does it detect regressions?	✅ Yes — "Regressed Queries" view
Does it auto-fix regressions?	✅ With `FORCE_LAST_GOOD_PLAN` (Automatic Tuning)
Max retention?	Configurable (default 30 days, size-based cleanup)

When to Use Query Store

Scenario	Use Query Store?
"Why did this query slow down last week?"	✅
"Which queries use the most CPU?"	✅
"Did a plan change cause regression?"	✅
"What's happening right now?"	❌ Use DMVs
"What wait types are most common?"	✅ (wait stats per query in QS)

DMV Quick Reference

DMVs give you real-time, in-the-moment diagnostics. They don't store history — for history, use Query Store.

Essential DMVs for DP-300

DMV	What It Shows	Use When
`sys.dm_exec_requests`	Currently running queries + waits	"What's running right now?"
`sys.dm_exec_sessions`	Active sessions and connection info	"Who's connected?"
`sys.dm_exec_query_stats`	Aggregated query performance (CPU, reads, duration)	"Top resource consumers"
`sys.dm_os_wait_stats`	Server-wide wait statistics	"What is SQL Server waiting on?"
`sys.dm_db_missing_index_*`	Missing index suggestions	"Which indexes should I create?"
`sys.dm_db_index_physical_stats`	Index fragmentation levels	"Which indexes need maintenance?"
`sys.dm_io_virtual_file_stats`	File-level I/O statistics	"Is storage the bottleneck?"
`sys.dm_tran_locks`	Current lock information	"What's causing blocking?"
`sys.dm_exec_sql_text()`	SQL text for a given handle	Used with other DMVs via CROSS APPLY
`sys.dm_exec_query_plan()`	Execution plan XML	Used with other DMVs via CROSS APPLY

Key Wait Types to Know

Wait Type	Meaning	Common Fix
`CXPACKET` / `CXCONSUMER`	Parallelism waits	Tune MAXDOP, Cost Threshold
`PAGEIOLATCH_*`	Waiting for I/O (data from disk)	Missing indexes, scale storage
`LCK_M_*`	Lock waits (blocking)	Optimize queries, shorter transactions
`WRITELOG`	Log write waits	Scale storage, move log to faster disk
`SOS_SCHEDULER_YIELD`	CPU pressure	Scale up, optimize queries
`RESOURCE_SEMAPHORE`	Memory grant waits	Reduce query memory needs, scale up
`PAGELATCH_*`	tempdb contention	Add tempdb files

Extended Events — When DMVs Aren't Enough

Extended Events (XEvents) are the lightweight tracing engine. Use when you need to capture specific events over time.

Use Case	XEvent to Use
Capture deadlock graphs	`xml_deadlock_report`
Track long-running queries	`sql_batch_completed` with duration filter
Monitor blocking	`blocked_process_report` (set threshold first)
Audit login failures	`error_reported` with error 18456
Track query timeouts	`attention` event
Monitor file growth	`database_file_size_change`

🎯 Exam Focus

Extended Events vs SQL Trace/Profiler: SQL Trace and SQL Server Profiler are deprecated. Extended Events is the replacement. The exam expects you to know XEvents, not Profiler. XEvents have lower overhead and support Azure SQL DB/MI.

Azure-Native Monitoring Tools

Azure Monitor Metrics

What: Platform-level numeric time-series (CPU%, DTU%, IO%, sessions)
Latency: ~1 minute
Retention: 93 days
Cost: Free (platform metrics)
Best for: Real-time health dashboards and metric alerts

Azure Monitor Logs (Log Analytics)

What: Structured event records queryable with KQL
Latency: 2-5 minutes
Retention: 30-730 days
Cost: Per GB ingested
Best for: Deep analysis, audit queries, custom dashboards

Intelligent Insights

What: ML-based performance anomaly detection
Works with: Azure SQL DB and MI
Best for: Automatic detection of regression, blocking, resource limits
Output: JSON insight in diagnostic logs

Query Performance Insight

What: Portal blade showing top queries by CPU/duration/execution count
Works with: Azure SQL DB only
Powered by: Query Store data
Best for: Quick visual "what's consuming resources?"

Database Watcher (Preview)

What: Near-real-time monitoring for Azure SQL estate
Works with: Azure SQL DB and MI
Best for: Fleet monitoring, dashboards, data collection to Azure Data Explorer
Key benefit: Low-latency SQL-level monitoring without agents

SQL Assessment (VM only)

What: Configuration best-practice checker
Works with: SQL on VM (via SQL IaaS Agent)
Best for: Validating VM storage, MAXDOP, tempdb, memory settings
Output: Recommendations in Azure Portal

Common Troubleshooting Workflow

Step 1: Detect

Signal	Tool
High CPU alert fires	Azure Monitor → `cpu_percent`
User reports "application is slow"	Azure Monitor → check all metrics
Scheduled job failed	SQL Agent history or Azure Monitor alerts

Step 2: Diagnose

Finding	Next Action
CPU > 90% sustained	DMVs → `sys.dm_exec_requests` → find top CPU queries
One specific query is slow	Query Store → check plan history → compare plans
Many queries suddenly slow	Query Store → Regressed Queries → check for plan changes
Blocking detected	DMVs → `sys.dm_exec_requests` (blocking_session_id)
I/O > 90%	DMVs → `sys.dm_io_virtual_file_stats` → check latency

Step 3: Fix

Root Cause	Fix
Bad query plan	Force good plan in Query Store
Missing index	Create index from `sys.dm_db_missing_index_details`
Resource limit hit	Scale up (more vCores, BC tier, or Hyperscale)
Blocking chain	Optimize the blocking query, shorten transactions
Parameter sniffing	Use `OPTIMIZE FOR`, `RECOMPILE`, or PSP (compat 160)

Tool Relationships

Tool	Feeds Into	Powered By
Query Store	Query Performance Insight, Automatic Tuning	SQL engine internal capture
DMVs	Custom scripts, monitoring solutions	SQL engine real-time state
Azure Monitor Metrics	Metric Alerts, Dashboards, Workbooks	Platform telemetry
Azure Monitor Logs	Log Alerts, KQL queries, Workbooks	Diagnostic Settings export
Extended Events	Custom analysis, deadlock review	SQL engine event framework
Intelligent Insights	Azure Monitor Logs	ML analysis of Query Store + DMV data
Database Watcher	Azure Data Explorer dashboards	Direct SQL data collection

Anti-Patterns

"Use Extended Events for everything because it's lightweight." XE is lightweight per event — a session capturing every sql_statement_completed on a busy OLTP server still produces gigabytes/hour and saturates the file target. Scope XE narrowly (filter by database_id, duration > N ms).
"Query Store is free, turn it on with defaults." Defaults capture every ad-hoc query and fill MAX_STORAGE_SIZE_MB (default 1 GB) in days on a busy server. Then QS goes read-only and you stop collecting new data. Set QUERY_CAPTURE_MODE = AUTO (skips trivial), bump storage on busy systems, monitor sys.database_query_store_options.
"DMVs and Query Store overlap, pick one." They answer different questions: DMVs = right now, QS = history. You need both.
"Database Watcher replaces Query Store." No — Database Watcher collects data including from QS and ships it to ADX for cross-server analysis. QS still has to be on at each source DB.
"Azure Monitor Metrics will tell me which query is slow." Metrics are server-level (DTU %, CPU %, log IO). They tell you something is slow, not which query. Drill into Query Store / QPI for the query identity.
"Just enable Diagnostic Settings to Log Analytics for everything." Per-GB ingest pricing. Sending all DMV/QS data to LA on 50 servers = surprise bill. Send what you'll query (errors, blocks, deadlocks) and keep deep diagnostics local.

⚠️ Watch Out

Query Store survives failover but not restore-to-different-server unless you copy sys.query_store_* tables. A RESTORE of a backup from another server brings the QS data with the database. PITR to a new DB also preserves QS. But manual schema-only deploys lose all QS history.

Migration Between Tools

From → To	Path	Cost
SQL Profiler / Trace → Extended Events	Convert trace to XE session via `sp_SQLskills_ConvertTraceToExtendedEvents` or rewrite	One-time rewrite; gain lightweight capture
Custom DMV scripts → Query Store	Enable QS, replace per-query DMV polling with `sys.query_store_*` views	Online; QS stores history DMVs throw away
Query Store → Database Watcher	Provision DW + ADX cluster; enable per-DB collection	Adds ADX cost; gain cross-server analysis
Diagnostic Settings to Storage → Log Analytics	Re-route Diagnostic Setting	Higher per-GB cost; gain KQL + alerts
Diagnostic Settings to Log Analytics → Event Hub	Add Event Hub destination	For SIEM/3rd-party export; doesn't replace LA
Manual XE analysis → Intelligent Insights	Enable Diagnostic Setting `SQLInsights` to LA	ML auto-detects regressions; doesn't replace deep dive
On-prem Performance Monitor → Azure Monitor Metrics	Use platform metrics + Workbooks	Free for basic metrics; Insights tier adds cost
Custom dashboards (Grafana) → Workbooks	Re-author in Workbook JSON	Native to portal; can still pair with Grafana for cross-cloud

The expensive moves are adding Database Watcher (ADX cluster) and shipping everything to Log Analytics (per-GB ingest).

Real Scenarios

Single Azure SQL DB, intermittent slowdowns → Query Store + Query Performance Insight + Automatic Tuning (FORCE_LAST_GOOD_PLAN). Driver: built-in, free, addresses regressions automatically. Trade-off: bound to one DB — no cross-server view.
Fleet of 200 MIs across regions → Database Watcher → ADX → Workbook. Driver: cross-server query patterns visible. Trade-off: ADX cluster ~$300/mo minimum + per-DB collection cost.
Random deadlocks once a week → XE session capturing xml_deadlock_report only, ring buffer target. Driver: targeted capture, near-zero overhead. Trade-off: ring buffer means losing data on restart — alert on the event to grab it.
Compliance requires 90-day audit trail of all logins → Auditing to Log Analytics + retention policy. Driver: KQL queryable, alerting native. Trade-off: LA ingest cost — limit to logon/logoff events, not data DML.
DBA wants real-time dashboard for ops floor → Azure Monitor Workbook with metrics + DMV-derived metrics via SQL Insights. Driver: refresh ≤ 1 min, no extra agent. Trade-off: Insights tier pricing per monitored target.

Flashcards

What is the difference between Query Store and DMVs?

Click to reveal answer

Query Store: historical per-query plan + stats (flight recorder). DMVs: real-time current state (what's happening NOW). Use QS for 'why did it slow down?' and DMVs for 'what's running right now?'

1 / 8

Quiz

Q1/5

0 correct

A query that ran fine for months suddenly became slow. Which tool should you check FIRST?

The Matrix​

"If You Suspect X, Start With Y"​

Query Store — Quick Decision Guide​

When to Use Query Store​

DMV Quick Reference​

Essential DMVs for DP-300​

Key Wait Types to Know​

Extended Events — When DMVs Aren't Enough​

Azure-Native Monitoring Tools​

Azure Monitor Metrics​

Azure Monitor Logs (Log Analytics)​

Intelligent Insights​

Query Performance Insight​

Database Watcher (Preview)​

SQL Assessment (VM only)​

Common Troubleshooting Workflow​

Step 1: Detect​

Step 2: Diagnose​

Step 3: Fix​

Tool Relationships​

Anti-Patterns​

Migration Between Tools​

Real Scenarios​

Flashcards​

Quiz​