Monitoring Tools Matrix
Too many monitoring tools, not sure which to use? This page cuts through the confusion — one place to understand what each tool does, when to use it, and how they relate.
The Matrix
| Tool | Best For | SQL DB | MI | VM | Complexity | Exam Weight |
|---|---|---|---|---|---|---|
| Query Store | Query plan history, regression detection | ✅ | ✅ | ✅ | Low | 🎯 High |
| DMVs | Real-time diagnostics (waits, sessions, index stats) | ✅ | ✅ | ✅ | Medium | 🎯 High |
| Azure Monitor Metrics | Platform-level health (CPU%, DTU%, IO%) | ✅ | ✅ | Partial | Low | 🎯 High |
| Azure Monitor Logs (KQL) | Deep log analysis, custom queries | ✅ | ✅ | ✅ | Medium | 📖 Medium |
| Extended Events | Lightweight targeted tracing | ✅ | ✅ | ✅ | High | 📖 Medium |
| Intelligent Insights | ML-based anomaly detection | ✅ | ✅ | ❌ | Low | 📖 Medium |
| Query Performance Insight | Portal-based top query view | ✅ | ❌ | ❌ | Low | 📖 Low |
| SQL Insights (Azure Monitor) | Centralized fleet monitoring | ✅ | ✅ | ✅ | Medium | 📚 Low |
| Database Watcher | Real-time Azure SQL estate monitoring | ✅ | ✅ | ❌ | Low | 📚 New |
| SQL Assessment | Best practice configuration checks | ❌ | ❌ | ✅ | Low | 📖 Medium |
| Activity Log | Control-plane events (scale, failover, config) | ✅ | ✅ | ✅ | Low | 🎯 Medium |
DP-300 focuses on: Query Store (plan regression, forced plans), DMVs (wait stats, missing indexes), Azure Monitor Metrics (alerting), and Activity Log (failover detection). Know these 4 well.
"If You Suspect X, Start With Y"
| Symptom | Start With | Then Go Deeper |
|---|---|---|
| Slow queries | Query Store → Regressed Queries view | DMVs: sys.dm_exec_query_stats |
| High CPU | Azure Monitor: cpu_percent metric | DMVs: sys.dm_exec_requests → find top CPU queries |
| Blocking / deadlocks | DMVs: sys.dm_exec_requests + sys.dm_tran_locks | Extended Events: blocked_process_report |
| I/O bottleneck | Azure Monitor: physical_data_read_percent | DMVs: sys.dm_io_virtual_file_stats |
| Missing indexes | DMVs: sys.dm_db_missing_index_* | Query Store: find queries that would benefit |
| Plan regression | Query Store → "Regressed Queries" | Automatic Tuning: FORCE_LAST_GOOD_PLAN |
| Connection failures | Azure Monitor: connection_failed metric | Activity Log: check for failover events |
| Throttling (log rate) | Azure Monitor: log_write_percent = 100% | Scale up vCores or move to BC tier |
| Unexpected failover | Activity Log: filter by failover events | Azure Monitor: check resource health |
| Memory pressure | DMVs: sys.dm_os_memory_clerks | Query Store: find memory-heavy queries |
| tempdb contention | DMVs: sys.dm_exec_requests (wait type = PAGELATCH) | Add tempdb files (VM only) |
| Index fragmentation | DMVs: sys.dm_db_index_physical_stats | Schedule rebuild/reorganize maintenance |
Query Store — Quick Decision Guide
Query Store is your flight recorder — it captures every query's plan and performance over time.
| Question | Answer |
|---|---|
| Is it enabled by default? | ✅ Yes on Azure SQL DB. ❌ Manual on VM. ✅ Yes on MI. |
| Where is data stored? | In the user database (not system DB) |
| Can I force a specific plan? | ✅ Yes — like Oracle SQL Profiles |
| Does it detect regressions? | ✅ Yes — "Regressed Queries" view |
| Does it auto-fix regressions? | ✅ With FORCE_LAST_GOOD_PLAN (Automatic Tuning) |
| Max retention? | Configurable (default 30 days, size-based cleanup) |
When to Use Query Store
| Scenario | Use Query Store? |
|---|---|
| "Why did this query slow down last week?" | ✅ |
| "Which queries use the most CPU?" | ✅ |
| "Did a plan change cause regression?" | ✅ |
| "What's happening right now?" | ❌ Use DMVs |
| "What wait types are most common?" | ✅ (wait stats per query in QS) |
DMV Quick Reference
DMVs give you real-time, in-the-moment diagnostics. They don't store history — for history, use Query Store.
Essential DMVs for DP-300
| DMV | What It Shows | Use When |
|---|---|---|
sys.dm_exec_requests | Currently running queries + waits | "What's running right now?" |
sys.dm_exec_sessions | Active sessions and connection info | "Who's connected?" |
sys.dm_exec_query_stats | Aggregated query performance (CPU, reads, duration) | "Top resource consumers" |
sys.dm_os_wait_stats | Server-wide wait statistics | "What is SQL Server waiting on?" |
sys.dm_db_missing_index_* | Missing index suggestions | "Which indexes should I create?" |
sys.dm_db_index_physical_stats | Index fragmentation levels | "Which indexes need maintenance?" |
sys.dm_io_virtual_file_stats | File-level I/O statistics | "Is storage the bottleneck?" |
sys.dm_tran_locks | Current lock information | "What's causing blocking?" |
sys.dm_exec_sql_text() | SQL text for a given handle | Used with other DMVs via CROSS APPLY |
sys.dm_exec_query_plan() | Execution plan XML | Used with other DMVs via CROSS APPLY |
Key Wait Types to Know
| Wait Type | Meaning | Common Fix |
|---|---|---|
CXPACKET / CXCONSUMER | Parallelism waits | Tune MAXDOP, Cost Threshold |
PAGEIOLATCH_* | Waiting for I/O (data from disk) | Missing indexes, scale storage |
LCK_M_* | Lock waits (blocking) | Optimize queries, shorter transactions |
WRITELOG | Log write waits | Scale storage, move log to faster disk |
SOS_SCHEDULER_YIELD | CPU pressure | Scale up, optimize queries |
RESOURCE_SEMAPHORE | Memory grant waits | Reduce query memory needs, scale up |
PAGELATCH_* | tempdb contention | Add tempdb files |
Extended Events — When DMVs Aren't Enough
Extended Events (XEvents) are the lightweight tracing engine. Use when you need to capture specific events over time.
| Use Case | XEvent to Use |
|---|---|
| Capture deadlock graphs | xml_deadlock_report |
| Track long-running queries | sql_batch_completed with duration filter |
| Monitor blocking | blocked_process_report (set threshold first) |
| Audit login failures | error_reported with error 18456 |
| Track query timeouts | attention event |
| Monitor file growth | database_file_size_change |
Extended Events vs SQL Trace/Profiler: SQL Trace and SQL Server Profiler are deprecated. Extended Events is the replacement. The exam expects you to know XEvents, not Profiler. XEvents have lower overhead and support Azure SQL DB/MI.
Azure-Native Monitoring Tools
Azure Monitor Metrics
- What: Platform-level numeric time-series (CPU%, DTU%, IO%, sessions)
- Latency: ~1 minute
- Retention: 93 days
- Cost: Free (platform metrics)
- Best for: Real-time health dashboards and metric alerts
Azure Monitor Logs (Log Analytics)
- What: Structured event records queryable with KQL
- Latency: 2-5 minutes
- Retention: 30-730 days
- Cost: Per GB ingested
- Best for: Deep analysis, audit queries, custom dashboards
Intelligent Insights
- What: ML-based performance anomaly detection
- Works with: Azure SQL DB and MI
- Best for: Automatic detection of regression, blocking, resource limits
- Output: JSON insight in diagnostic logs
Query Performance Insight
- What: Portal blade showing top queries by CPU/duration/execution count
- Works with: Azure SQL DB only
- Powered by: Query Store data
- Best for: Quick visual "what's consuming resources?"
Database Watcher (Preview)
- What: Near-real-time monitoring for Azure SQL estate
- Works with: Azure SQL DB and MI
- Best for: Fleet monitoring, dashboards, data collection to Azure Data Explorer
- Key benefit: Low-latency SQL-level monitoring without agents
SQL Assessment (VM only)
- What: Configuration best-practice checker
- Works with: SQL on VM (via SQL IaaS Agent)
- Best for: Validating VM storage, MAXDOP, tempdb, memory settings
- Output: Recommendations in Azure Portal
Common Troubleshooting Workflow
Step 1: Detect
| Signal | Tool |
|---|---|
| High CPU alert fires | Azure Monitor → cpu_percent |
| User reports "application is slow" | Azure Monitor → check all metrics |
| Scheduled job failed | SQL Agent history or Azure Monitor alerts |
Step 2: Diagnose
| Finding | Next Action |
|---|---|
| CPU > 90% sustained | DMVs → sys.dm_exec_requests → find top CPU queries |
| One specific query is slow | Query Store → check plan history → compare plans |
| Many queries suddenly slow | Query Store → Regressed Queries → check for plan changes |
| Blocking detected | DMVs → sys.dm_exec_requests (blocking_session_id) |
| I/O > 90% | DMVs → sys.dm_io_virtual_file_stats → check latency |
Step 3: Fix
| Root Cause | Fix |
|---|---|
| Bad query plan | Force good plan in Query Store |
| Missing index | Create index from sys.dm_db_missing_index_details |
| Resource limit hit | Scale up (more vCores, BC tier, or Hyperscale) |
| Blocking chain | Optimize the blocking query, shorten transactions |
| Parameter sniffing | Use OPTIMIZE FOR, RECOMPILE, or PSP (compat 160) |
Tool Relationships
| Tool | Feeds Into | Powered By |
|---|---|---|
| Query Store | Query Performance Insight, Automatic Tuning | SQL engine internal capture |
| DMVs | Custom scripts, monitoring solutions | SQL engine real-time state |
| Azure Monitor Metrics | Metric Alerts, Dashboards, Workbooks | Platform telemetry |
| Azure Monitor Logs | Log Alerts, KQL queries, Workbooks | Diagnostic Settings export |
| Extended Events | Custom analysis, deadlock review | SQL engine event framework |
| Intelligent Insights | Azure Monitor Logs | ML analysis of Query Store + DMV data |
| Database Watcher | Azure Data Explorer dashboards | Direct SQL data collection |