Skip to main content

Table Partitioning, Sharding & Data Compression β€” Deep Dive

The April 24, 2026 DP-300 outline calls out three Domain 1 skills around physical data organization:

"Plan and configure table partitioning"
"Recommend database sharding solutions"
"Configure data compression"

This page covers all three end-to-end with T-SQL, decision tables, and operational guidance.

Part 1 β€” Table partitioning​

What it is (and isn't)​

Partitioning splits one logical table into multiple physical partitions based on a partitioning column. The table looks like a single object to applications and queries, but storage, statistics, and certain operations work per partition.

Partitioning doesPartitioning does not
Enable instant partition switching for data loads & purgesMake point queries faster automatically
Enable partition elimination in plans (when predicates align)Replace good indexing
Allow piecemeal restores by filegroupDistribute data across servers (that's sharding)
Reduce locking blast radius for maintenanceIncrease parallelism on small tables
🎯 Exam Focus

The exam loves the sliding window scenario: monthly fact tables where you add a new partition for the new month and switch out the oldest one. If a question says "load the day's data into a staging table, then make it visible instantly without INSERT", the answer is partition switching (ALTER TABLE ... SWITCH PARTITION). :::

Architecture: function + scheme + table​

The three-piece partitioning model
πŸ“
1. Partition function
Defines boundary points
RANGE LEFT or RANGE RIGHT
Maps a value β†’ a partition number
πŸ—‚οΈ
2. Partition scheme
Maps partitions β†’ filegroups
ALL TO [PRIMARY] is common in Azure SQL
Per-FG layout enables piecemeal restore
πŸ“‹
3. Partitioned table
Created ON <scheme>(<column>)
All indexes should be aligned
Switch / merge / split operate per partition

Boundary semantics β€” RANGE LEFT vs RANGE RIGHT​

DefinitionBoundary value belongs to…
RANGE LEFT FOR VALUES ('2025-01-01','2025-02-01')…the partition at or before the boundary (≀)
RANGE RIGHT FOR VALUES ('2025-01-01','2025-02-01')…the partition at or after the boundary (β‰₯)

For date-based sliding windows the convention is RANGE RIGHT with the first day of each month as the boundary. This puts each month neatly into its own partition without "off-by-one" headaches.

Step-by-step build​

-- 1. Create filegroups (optional; on Azure SQL DB use PRIMARY)
ALTER DATABASE Sales ADD FILEGROUP FG_2025_Q1;
ALTER DATABASE Sales ADD FILEGROUP FG_2025_Q2;

-- 2. Partition function (monthly, RANGE RIGHT, datetime2)
CREATE PARTITION FUNCTION pf_OrderDate (datetime2)
AS RANGE RIGHT FOR VALUES (
'2025-01-01','2025-02-01','2025-03-01','2025-04-01',
'2025-05-01','2025-06-01','2025-07-01','2025-08-01',
'2025-09-01','2025-10-01','2025-11-01','2025-12-01',
'2026-01-01'
);

-- 3. Partition scheme β€” all to PRIMARY for Azure SQL DB
CREATE PARTITION SCHEME ps_OrderDate
AS PARTITION pf_OrderDate ALL TO ([PRIMARY]);

-- 4. Partitioned table (note: partition column must be in the clustered key)
CREATE TABLE dbo.Orders
(
OrderId BIGINT NOT NULL,
OrderDate datetime2(3) NOT NULL,
CustomerId INT NOT NULL,
Amount DECIMAL(19,4) NOT NULL,
CONSTRAINT PK_Orders PRIMARY KEY CLUSTERED (OrderDate, OrderId)
ON ps_OrderDate(OrderDate)
);

-- 5. Aligned nonclustered index (also partitioned on the same scheme)
CREATE NONCLUSTERED INDEX IX_Orders_Customer
ON dbo.Orders (CustomerId)
INCLUDE (Amount)
ON ps_OrderDate(OrderDate);
🎯 Exam Focus

Aligned indexes are mandatory for partition switching. A nonclustered index that is not on the same partition scheme blocks ALTER TABLE ... SWITCH PARTITION. :::

The sliding window pattern​

Every month: add a new boundary on the right, switch out the oldest partition into an archive table.

-- A. Make next month's filegroup the next-used FG (no-op on Azure SQL DB)
ALTER PARTITION SCHEME ps_OrderDate NEXT USED [PRIMARY];

-- B. Add the new boundary (creates a new empty partition for Feb-2026)
ALTER PARTITION FUNCTION pf_OrderDate () SPLIT RANGE ('2026-02-01');

-- C. Create an empty staging table aligned to the partition you'll switch out
CREATE TABLE dbo.Orders_Archive_2025_01
(
OrderId BIGINT NOT NULL,
OrderDate datetime2(3) NOT NULL,
CustomerId INT NOT NULL,
Amount DECIMAL(19,4) NOT NULL,
CONSTRAINT PK_Orders_Archive_2025_01 PRIMARY KEY CLUSTERED (OrderDate, OrderId)
ON [PRIMARY]
);

-- D. Switch the January 2025 partition OUT of Orders into the archive β€” METADATA only, instant
ALTER TABLE dbo.Orders
SWITCH PARTITION 2 TO dbo.Orders_Archive_2025_01;

-- E. Merge the now-empty boundary so the partition count stays steady
ALTER PARTITION FUNCTION pf_OrderDate () MERGE RANGE ('2025-01-01');

The SWITCH operation is a metadata-only rename β€” no data is copied, no log is generated for the rows. This is the canonical "instant load / instant purge" pattern for warehouses.

Inspect what you've built​

SELECT  ps.name        AS scheme,
pf.name AS function,
p.partition_number,
p.rows,
rv.value AS boundary_value,
ds.name AS filegroup
FROM sys.partitions p
JOIN sys.indexes i ON i.object_id = p.object_id AND i.index_id = p.index_id
JOIN sys.partition_schemes ps ON ps.data_space_id = i.data_space_id
JOIN sys.partition_functions pf ON pf.function_id = ps.function_id
LEFT JOIN sys.partition_range_values rv
ON rv.function_id = pf.function_id AND rv.boundary_id = p.partition_number
JOIN sys.destination_data_spaces dds
ON dds.partition_scheme_id = ps.data_space_id AND dds.destination_id = p.partition_number
JOIN sys.data_spaces ds ON ds.data_space_id = dds.data_space_id
WHERE p.object_id = OBJECT_ID('dbo.Orders') AND i.index_id <= 1
ORDER BY p.partition_number;

Switch-out rules cheat-sheet​

For ALTER TABLE source SWITCH PARTITION n TO target:

RuleRequired
Both tables in the same database & filegroupβœ…
Identical structure (columns, datatypes, nullability)βœ…
All indexes on source must be aligned (same partition scheme)βœ…
Target partition must be emptyβœ…
CHECK constraint on target enforcing the partition's boundaryβœ…
Both tables identical compression setting on the affected partitionβœ…

Part 2 β€” Database sharding (recommend)​

What sharding is​

While partitioning keeps everything in one database, sharding spreads rows across many databases (often many servers) using a shard key. Azure SQL Database supports sharding via the Elastic Database Tools client library.

When to recommend sharding​

SignalSharding fits?
Single DB hits the storage ceiling of its tier (e.g., 4 TB on Hyperscale Premium-series β†’ unlikely; on a single GP DB β†’ maybe)βœ…
Multi-tenant SaaS where each tenant is a databaseβœ…
Horizontal write scaling needed beyond one Azure SQL DBβœ…
Read-heavy single-tenant workload❌ β€” use read scale-out or Hyperscale named replicas
Pure analytics workload❌ β€” use Synapse / Fabric

Sharding patterns​

PatternDescriptionBest for
Single-tenant per shardEach customer = one DBSaaS apps with tenants of varied sizes
Multi-tenant per shardMany small tenants packed per DBSMB SaaS, predictable tenant size
Range shardingcustomerId BETWEEN 1 AND 10000 β†’ shard ABulk loads, predictable cardinality
Hash shardinghash(customerId) % N β†’ shard NEven distribution, no hotspots
Lookup sharding (shard map manager)Mapping table resolves shardKey β†’ DBMaximum flexibility, supports re-balancing

Elastic Database Tools β€” components​

πŸ—ΊοΈ
Shard Map Manager
A small Azure SQL DB that stores the mapping (shardKey β†’ shard DB)
βš™οΈ
Elastic Client Library
.NET / Java SDK that resolves the shard for each request and pools connections
πŸ”§
Elastic Tools
Split-Merge tool, multi-shard query API, elastic jobs (run T-SQL across shards)

If the scenario says "scale out beyond one Azure SQL DB and we need cross-shard queries":

  1. Shard map manager in Azure SQL DB (β‰₯ S1).
  2. Elastic pool to host the shards cost-effectively.
  3. Hash sharding unless the data has natural ranges.
  4. Elastic Jobs to run maintenance T-SQL across all shards.
  5. Read scale-out within each shard for read amplification.

Part 3 β€” Data compression​

Compression saves storage and I/O (less to read = fewer logical/physical reads). It costs CPU on modify; reads are usually faster end-to-end because the buffer pool holds more pages.

Compression types​

TypeWhat it doesTypical savingsBest for
ROWVariable-length storage for fixed-length types; trim trailing zeros/spaces10–30%OLTP heavy update tables
PAGEROW + prefix + dictionary compression at the page level30–60%Read-heavy, redundant values, warehouses
COLUMNSTOREColumnar storage with run-length & dictionary encoding60–90%Analytics fact tables (β‰₯ 1 M rows)
COLUMNSTORE_ARCHIVEColumnstore + extra LZ77 pass+30% over columnstoreCold partitions you rarely scan

Apply compression​

-- Whole table or index, single setting
ALTER TABLE dbo.Orders REBUILD WITH (DATA_COMPRESSION = PAGE);

-- Per partition (mix hot=ROW, cold=PAGE on the same table)
ALTER TABLE dbo.Orders REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = PAGE ON PARTITIONS (1 TO 10),
DATA_COMPRESSION = ROW ON PARTITIONS (11 TO 14));

-- Index
ALTER INDEX IX_Orders_Customer ON dbo.Orders
REBUILD WITH (DATA_COMPRESSION = PAGE, ONLINE = ON);

-- Columnstore
CREATE CLUSTERED COLUMNSTORE INDEX CCI_FactSales
ON dbo.FactSales
WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE); -- COLD partitions

Estimating savings before you commit​

EXEC sp_estimate_data_compression_savings 
@schema_name = N'dbo',
@object_name = N'Orders',
@index_id = NULL,
@partition_number = NULL,
@data_compression = N'PAGE';

Returns size before/after for each partition so you can budget the rebuild.

Decision matrix​

Workload signatureRecommend
OLTP, very high update churn, narrow rowsROW (low CPU cost)
OLTP/mixed with repeated values per pagePAGE
Warehouse fact table, mostly read, β‰₯ 1 M rowsClustered columnstore
Old fact partitions, queried occasionallyColumnstore archive
Tiny tables (< a few MB)None β€” overhead not worth it
🏒 Real-World DBA Note

Oracle DBA parallel: ROW β‰ˆ Oracle Basic Compression; PAGE β‰ˆ OLTP Compression (Advanced Compression option); columnstore + archive β‰ˆ HCC (Hybrid Columnar Compression) on Exadata. The rule "compress hot in ROW, cold in COLUMNSTORE_ARCHIVE per partition" mirrors Oracle's Heat Map + ADO policies. :::

Bringing it together β€” partitioned + compressed warehouse table​

-- Hot partitions PAGE-compressed (last 3 months) ; cold partitions COLUMNSTORE_ARCHIVE
ALTER TABLE dbo.Orders REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE ON PARTITIONS (1 TO 10),
DATA_COMPRESSION = PAGE ON PARTITIONS (11 TO 13),
DATA_COMPRESSION = NONE ON PARTITIONS (14));

This gives you fast modifications on the current partition, balanced older months, and maximum compression on archive months β€” all in one table.


Anti-Patterns​

  • "Partition every large table." Partitioning helps sliding-window data lifecycle and partition elimination on the partition column. A 50 M-row table queried by non-partition columns gets no benefit β€” just management overhead. Use it because of an access pattern, not because the table is big.
  • "PAGE compression on everything for the storage savings." PAGE compression has CPU cost on writes. OLTP hot tables with high write rates can lose 5–10 % CPU. Use ROW compression for write-heavy, PAGE for read-heavy.
  • "COLUMNSTORE_ARCHIVE on the current partition for max savings." COLUMNSTORE_ARCHIVE is read-optimized (10–20Γ— compression) but writes cost ~3Γ—. Reserve it for partitions that are truly immutable.
  • "ALTER TABLE ... SWITCH is instant for any 2 partitions." Both partitions must have identical schema, indexes, constraints, and compression settings. Mismatched compression = the SWITCH is rejected. Plan compression at the partition scheme stage.
  • "Partition function = the index." No β€” the partition column should also be the leading column of the clustered index for best partition elimination. Otherwise the optimizer scans across partitions.
  • "Just RIGHT or just LEFT, doesn't matter." Off-by-one bugs at the boundary cost you a future weekend. Date partitioning = RANGE RIGHT (boundary date belongs to the new partition). Numeric ID partitioning = either, just be consistent.
⚠️ Watch Out

Compression is set per-partition, not per-table. When you ALTER TABLE ... REBUILD, you reset all partitions to the table-level compression unless you specify per-partition. Always use the explicit ON PARTITIONS (...) form for partitioned tables.


Migration Between Partitioning / Compression Strategies​

From β†’ ToPathCost
Non-partitioned β†’ Partitioned (same data)Create partition function + scheme, drop+recreate clustered index ON scheme(col)Offline rebuild; sized as a full clustered index rebuild
Single-filegroup partitioned β†’ Multi-filegroup partitionedModify partition scheme to map partitions to different FGs; rebuild affected partitionsPer-partition rebuild; gain backup/restore granularity
ROW β†’ PAGE compression (whole table)ALTER INDEX ... REBUILD WITH (DATA_COMPRESSION = PAGE)One online rebuild (Enterprise/MI/Azure SQL); ~30 % more savings
PAGE β†’ COLUMNSTORE (whole table)CREATE CLUSTERED COLUMNSTORE INDEXDrops rowstore CL index; analytical query gains; OLTP loses
Page/Row β†’ mixed per-partition (hot/cold)Per-partition REBUILD WITH (DATA_COMPRESSION = ...)Per-partition online rebuild; biggest storage win
Sliding-window via DELETE β†’ SWITCH OUTBuild empty staging table with matching schema; SWITCHNear-instant metadata operation vs hours of DELETE
RANGE LEFT β†’ RANGE RIGHTRecreate partition function (no in-place change)Full table rebuild via SWITCH or BCP

Most expensive moves: changing partition function direction (LEFT↔RIGHT) and rowstore↔columnstore conversion.


Real Scenarios​

  1. Sales fact table, 2 B rows, 5-year retention, monthly archive β†’ RANGE RIGHT monthly partition function, partition scheme spreading partitions across 3 FGs, current month PAGE, prior 11 months PAGE, > 12 months COLUMNSTORE_ARCHIVE, monthly SWITCH OUT to archive table. Driver: lifecycle + storage. Trade-off: SWITCH script must be 100 % schema-aligned.
  2. OLTP order table, 200 GB, write-heavy β†’ No partitioning, ROW compression. Driver: writes dominate. Trade-off: fewer storage savings than PAGE.
  3. Time-series IoT data, 10 TB/year, query last 7 days only β†’ Daily partitions on event_time, columnstore on > 7-day partitions, rowstore on current 7. Driver: hot/cold split. Trade-off: 30+ partitions/month β€” automate via Elastic Job or SQL Agent.
  4. Reporting cube replaced by SQL Server columnstore β†’ Wide fact table, partitioned by date, columnstore index, no compression mixing. Driver: analytical queries. Trade-off: load performance β€” use staging table + SWITCH.
  5. Multi-tenant SaaS: per-tenant partitions β†’ Hash-style RANGE partition by tenant_id bucket (e.g., 16 buckets), PAGE compression. Driver: noisy-neighbor isolation at index level. Trade-off: re-balancing buckets is painful β€” size at design time.

Flashcards​

What three objects make up a partitioned table in SQL?
Click to reveal answer
Partition function (boundary values), partition scheme (function β†’ filegroups), and the table created ON the scheme.
1 / 7

Practice quiz​

Q1/5
0 correct
You manage a fact table that loads 50 GB nightly and must purge data older than 24 months without blocking reads. Which technique should you implement?

Sources & further reading​

Last verified against MS Learn: April 28, 2026.