Table Partitioning, Sharding & Data Compression — Deep Dive

The April 24, 2026 DP-300 outline calls out three Domain 1 skills around physical data organization:

"Plan and configure table partitioning"
"Recommend database sharding solutions"
"Configure data compression"

This page covers all three end-to-end with T-SQL, decision tables, and operational guidance.

Part 1 — Table partitioning

What it is (and isn't)

Partitioning splits one logical table into multiple physical partitions based on a partitioning column. The table looks like a single object to applications and queries, but storage, statistics, and certain operations work per partition.

Partitioning does	Partitioning does not
Enable instant partition switching for data loads & purges	Make point queries faster automatically
Enable partition elimination in plans (when predicates align)	Replace good indexing
Allow piecemeal restores by filegroup	Distribute data across servers (that's sharding)
Reduce locking blast radius for maintenance	Increase parallelism on small tables

🎯 Exam Focus

The exam loves the sliding window scenario: monthly fact tables where you add a new partition for the new month and switch out the oldest one. If a question says "load the day's data into a staging table, then make it visible instantly without INSERT", the answer is partition switching (ALTER TABLE ... SWITCH PARTITION). :::

Architecture: function + scheme + table

The three-piece partitioning model

📐

1. Partition function

Defines boundary points

RANGE LEFT or RANGE RIGHT

Maps a value → a partition number

🗂️

2. Partition scheme

Maps partitions → filegroups

ALL TO [PRIMARY] is common in Azure SQL

Per-FG layout enables piecemeal restore

📋

3. Partitioned table

Created ON <scheme>(<column>)

All indexes should be aligned

Switch / merge / split operate per partition

Boundary semantics — RANGE LEFT vs RANGE RIGHT

Definition	Boundary value belongs to…
`RANGE LEFT FOR VALUES ('2025-01-01','2025-02-01')`	…the partition at or before the boundary (≤)
`RANGE RIGHT FOR VALUES ('2025-01-01','2025-02-01')`	…the partition at or after the boundary (≥)

For date-based sliding windows the convention is RANGE RIGHT with the first day of each month as the boundary. This puts each month neatly into its own partition without "off-by-one" headaches.

Step-by-step build

-- 1. Create filegroups (optional; on Azure SQL DB use PRIMARY)
ALTER DATABASE Sales ADD FILEGROUP FG_2025_Q1;
ALTER DATABASE Sales ADD FILEGROUP FG_2025_Q2;

-- 2. Partition function (monthly, RANGE RIGHT, datetime2)
CREATE PARTITION FUNCTION pf_OrderDate (datetime2)
AS RANGE RIGHT FOR VALUES (
    '2025-01-01','2025-02-01','2025-03-01','2025-04-01',
    '2025-05-01','2025-06-01','2025-07-01','2025-08-01',
    '2025-09-01','2025-10-01','2025-11-01','2025-12-01',
    '2026-01-01'
);

-- 3. Partition scheme — all to PRIMARY for Azure SQL DB
CREATE PARTITION SCHEME ps_OrderDate
AS PARTITION pf_OrderDate ALL TO ([PRIMARY]);

-- 4. Partitioned table (note: partition column must be in the clustered key)
CREATE TABLE dbo.Orders
(
    OrderId        BIGINT       NOT NULL,
    OrderDate      datetime2(3) NOT NULL,
    CustomerId     INT          NOT NULL,
    Amount         DECIMAL(19,4) NOT NULL,
    CONSTRAINT PK_Orders PRIMARY KEY CLUSTERED (OrderDate, OrderId)
        ON ps_OrderDate(OrderDate)
);

-- 5. Aligned nonclustered index (also partitioned on the same scheme)
CREATE NONCLUSTERED INDEX IX_Orders_Customer
    ON dbo.Orders (CustomerId)
    INCLUDE (Amount)
    ON ps_OrderDate(OrderDate);

🎯 Exam Focus

Aligned indexes are mandatory for partition switching. A nonclustered index that is not on the same partition scheme blocks ALTER TABLE ... SWITCH PARTITION. :::

The sliding window pattern

Every month: add a new boundary on the right, switch out the oldest partition into an archive table.

-- A. Make next month's filegroup the next-used FG (no-op on Azure SQL DB)
ALTER PARTITION SCHEME ps_OrderDate NEXT USED [PRIMARY];

-- B. Add the new boundary (creates a new empty partition for Feb-2026)
ALTER PARTITION FUNCTION pf_OrderDate () SPLIT RANGE ('2026-02-01');

-- C. Create an empty staging table aligned to the partition you'll switch out
CREATE TABLE dbo.Orders_Archive_2025_01
(
    OrderId        BIGINT       NOT NULL,
    OrderDate      datetime2(3) NOT NULL,
    CustomerId     INT          NOT NULL,
    Amount         DECIMAL(19,4) NOT NULL,
    CONSTRAINT PK_Orders_Archive_2025_01 PRIMARY KEY CLUSTERED (OrderDate, OrderId)
        ON [PRIMARY]
);

-- D. Switch the January 2025 partition OUT of Orders into the archive — METADATA only, instant
ALTER TABLE dbo.Orders
    SWITCH PARTITION 2 TO dbo.Orders_Archive_2025_01;

-- E. Merge the now-empty boundary so the partition count stays steady
ALTER PARTITION FUNCTION pf_OrderDate () MERGE RANGE ('2025-01-01');

The SWITCH operation is a metadata-only rename — no data is copied, no log is generated for the rows. This is the canonical "instant load / instant purge" pattern for warehouses.

Inspect what you've built

SELECT  ps.name        AS scheme,
        pf.name        AS function,
        p.partition_number,
        p.rows,
        rv.value       AS boundary_value,
        ds.name        AS filegroup
FROM    sys.partitions p
JOIN    sys.indexes i               ON i.object_id = p.object_id AND i.index_id = p.index_id
JOIN    sys.partition_schemes ps    ON ps.data_space_id = i.data_space_id
JOIN    sys.partition_functions pf  ON pf.function_id = ps.function_id
LEFT JOIN sys.partition_range_values rv
        ON rv.function_id = pf.function_id AND rv.boundary_id = p.partition_number
JOIN    sys.destination_data_spaces dds
        ON dds.partition_scheme_id = ps.data_space_id AND dds.destination_id = p.partition_number
JOIN    sys.data_spaces ds          ON ds.data_space_id = dds.data_space_id
WHERE   p.object_id = OBJECT_ID('dbo.Orders') AND i.index_id <= 1
ORDER BY p.partition_number;

Switch-out rules cheat-sheet

For ALTER TABLE source SWITCH PARTITION n TO target:

Rule	Required
Both tables in the same database & filegroup	✅
Identical structure (columns, datatypes, nullability)	✅
All indexes on source must be aligned (same partition scheme)	✅
Target partition must be empty	✅
`CHECK` constraint on target enforcing the partition's boundary	✅
Both tables identical compression setting on the affected partition	✅

What sharding is

While partitioning keeps everything in one database, sharding spreads rows across many databases (often many servers) using a shard key. Azure SQL Database supports sharding via the Elastic Database Tools client library.

Signal	Sharding fits?
Single DB hits the storage ceiling of its tier (e.g., 4 TB on Hyperscale Premium-series → unlikely; on a single GP DB → maybe)	✅
Multi-tenant SaaS where each tenant is a database	✅
Horizontal write scaling needed beyond one Azure SQL DB	✅
Read-heavy single-tenant workload	❌ — use read scale-out or Hyperscale named replicas
Pure analytics workload	❌ — use Synapse / Fabric

Sharding patterns

Pattern	Description	Best for
Single-tenant per shard	Each customer = one DB	SaaS apps with tenants of varied sizes
Multi-tenant per shard	Many small tenants packed per DB	SMB SaaS, predictable tenant size
Range sharding	`customerId BETWEEN 1 AND 10000 → shard A`	Bulk loads, predictable cardinality
Hash sharding	`hash(customerId) % N → shard N`	Even distribution, no hotspots
Lookup sharding (shard map manager)	Mapping table resolves shardKey → DB	Maximum flexibility, supports re-balancing

Elastic Database Tools — components

🗺️

Shard Map Manager

A small Azure SQL DB that stores the mapping (shardKey → shard DB)

⚙️

Elastic Client Library

.NET / Java SDK that resolves the shard for each request and pools connections

🔧

Elastic Tools

Split-Merge tool, multi-shard query API, elastic jobs (run T-SQL across shards)

Recommended workflow on the exam

If the scenario says "scale out beyond one Azure SQL DB and we need cross-shard queries":

Shard map manager in Azure SQL DB (≥ S1).
Elastic pool to host the shards cost-effectively.
Hash sharding unless the data has natural ranges.
Elastic Jobs to run maintenance T-SQL across all shards.
Read scale-out within each shard for read amplification.

Part 3 — Data compression

Compression saves storage and I/O (less to read = fewer logical/physical reads). It costs CPU on modify; reads are usually faster end-to-end because the buffer pool holds more pages.

Compression types

Type	What it does	Typical savings	Best for
ROW	Variable-length storage for fixed-length types; trim trailing zeros/spaces	10–30%	OLTP heavy update tables
PAGE	ROW + prefix + dictionary compression at the page level	30–60%	Read-heavy, redundant values, warehouses
COLUMNSTORE	Columnar storage with run-length & dictionary encoding	60–90%	Analytics fact tables (≥ 1 M rows)
COLUMNSTORE_ARCHIVE	Columnstore + extra LZ77 pass	+30% over columnstore	Cold partitions you rarely scan

Apply compression

-- Whole table or index, single setting
ALTER TABLE dbo.Orders REBUILD WITH (DATA_COMPRESSION = PAGE);

-- Per partition (mix hot=ROW, cold=PAGE on the same table)
ALTER TABLE dbo.Orders REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = PAGE ON PARTITIONS (1 TO 10),
      DATA_COMPRESSION = ROW  ON PARTITIONS (11 TO 14));

-- Index
ALTER INDEX IX_Orders_Customer ON dbo.Orders
    REBUILD WITH (DATA_COMPRESSION = PAGE, ONLINE = ON);

-- Columnstore
CREATE CLUSTERED COLUMNSTORE INDEX CCI_FactSales
    ON dbo.FactSales
    WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE);  -- COLD partitions

Estimating savings before you commit

EXEC sp_estimate_data_compression_savings 
    @schema_name      = N'dbo',
    @object_name      = N'Orders',
    @index_id         = NULL,
    @partition_number = NULL,
    @data_compression = N'PAGE';

Returns size before/after for each partition so you can budget the rebuild.

Decision matrix

Workload signature	Recommend
OLTP, very high update churn, narrow rows	ROW (low CPU cost)
OLTP/mixed with repeated values per page	PAGE
Warehouse fact table, mostly read, ≥ 1 M rows	Clustered columnstore
Old fact partitions, queried occasionally	Columnstore archive
Tiny tables (< a few MB)	None — overhead not worth it

🏢 Real-World DBA Note

Oracle DBA parallel: ROW ≈ Oracle Basic Compression; PAGE ≈ OLTP Compression (Advanced Compression option); columnstore + archive ≈ HCC (Hybrid Columnar Compression) on Exadata. The rule "compress hot in ROW, cold in COLUMNSTORE_ARCHIVE per partition" mirrors Oracle's Heat Map + ADO policies. :::

Bringing it together — partitioned + compressed warehouse table

-- Hot partitions PAGE-compressed (last 3 months) ; cold partitions COLUMNSTORE_ARCHIVE
ALTER TABLE dbo.Orders REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE ON PARTITIONS (1  TO 10),
      DATA_COMPRESSION = PAGE                ON PARTITIONS (11 TO 13),
      DATA_COMPRESSION = NONE                ON PARTITIONS (14));

This gives you fast modifications on the current partition, balanced older months, and maximum compression on archive months — all in one table.

Anti-Patterns

"Partition every large table." Partitioning helps sliding-window data lifecycle and partition elimination on the partition column. A 50 M-row table queried by non-partition columns gets no benefit — just management overhead. Use it because of an access pattern, not because the table is big.
"PAGE compression on everything for the storage savings." PAGE compression has CPU cost on writes. OLTP hot tables with high write rates can lose 5–10 % CPU. Use ROW compression for write-heavy, PAGE for read-heavy.
"COLUMNSTORE_ARCHIVE on the current partition for max savings." COLUMNSTORE_ARCHIVE is read-optimized (10–20× compression) but writes cost ~3×. Reserve it for partitions that are truly immutable.
"ALTER TABLE ... SWITCH is instant for any 2 partitions." Both partitions must have identical schema, indexes, constraints, and compression settings. Mismatched compression = the SWITCH is rejected. Plan compression at the partition scheme stage.
"Partition function = the index." No — the partition column should also be the leading column of the clustered index for best partition elimination. Otherwise the optimizer scans across partitions.
"Just RIGHT or just LEFT, doesn't matter." Off-by-one bugs at the boundary cost you a future weekend. Date partitioning = RANGE RIGHT (boundary date belongs to the new partition). Numeric ID partitioning = either, just be consistent.

⚠️ Watch Out

Compression is set per-partition, not per-table. When you ALTER TABLE ... REBUILD, you reset all partitions to the table-level compression unless you specify per-partition. Always use the explicit ON PARTITIONS (...) form for partitioned tables.

Migration Between Partitioning / Compression Strategies

From → To	Path	Cost
Non-partitioned → Partitioned (same data)	Create partition function + scheme, drop+recreate clustered index `ON scheme(col)`	Offline rebuild; sized as a full clustered index rebuild
Single-filegroup partitioned → Multi-filegroup partitioned	Modify partition scheme to map partitions to different FGs; rebuild affected partitions	Per-partition rebuild; gain backup/restore granularity
ROW → PAGE compression (whole table)	`ALTER INDEX ... REBUILD WITH (DATA_COMPRESSION = PAGE)`	One online rebuild (Enterprise/MI/Azure SQL); ~30 % more savings
PAGE → COLUMNSTORE (whole table)	`CREATE CLUSTERED COLUMNSTORE INDEX`	Drops rowstore CL index; analytical query gains; OLTP loses
Page/Row → mixed per-partition (hot/cold)	Per-partition `REBUILD WITH (DATA_COMPRESSION = ...)`	Per-partition online rebuild; biggest storage win
Sliding-window via DELETE → SWITCH OUT	Build empty staging table with matching schema; SWITCH	Near-instant metadata operation vs hours of DELETE
RANGE LEFT → RANGE RIGHT	Recreate partition function (no in-place change)	Full table rebuild via SWITCH or BCP

Most expensive moves: changing partition function direction (LEFT↔RIGHT) and rowstore↔columnstore conversion.

Real Scenarios

Sales fact table, 2 B rows, 5-year retention, monthly archive → RANGE RIGHT monthly partition function, partition scheme spreading partitions across 3 FGs, current month PAGE, prior 11 months PAGE, > 12 months COLUMNSTORE_ARCHIVE, monthly SWITCH OUT to archive table. Driver: lifecycle + storage. Trade-off: SWITCH script must be 100 % schema-aligned.
OLTP order table, 200 GB, write-heavy → No partitioning, ROW compression. Driver: writes dominate. Trade-off: fewer storage savings than PAGE.
Time-series IoT data, 10 TB/year, query last 7 days only → Daily partitions on event_time, columnstore on > 7-day partitions, rowstore on current 7. Driver: hot/cold split. Trade-off: 30+ partitions/month — automate via Elastic Job or SQL Agent.
Reporting cube replaced by SQL Server columnstore → Wide fact table, partitioned by date, columnstore index, no compression mixing. Driver: analytical queries. Trade-off: load performance — use staging table + SWITCH.
Multi-tenant SaaS: per-tenant partitions → Hash-style RANGE partition by tenant_id bucket (e.g., 16 buckets), PAGE compression. Driver: noisy-neighbor isolation at index level. Trade-off: re-balancing buckets is painful — size at design time.

Flashcards

What three objects make up a partitioned table in SQL?

Click to reveal answer

Partition function (boundary values), partition scheme (function → filegroups), and the table created ON the scheme.

1 / 7

Practice quiz

Q1/5

0 correct

You manage a fact table that loads 50 GB nightly and must purge data older than 24 months without blocking reads. Which technique should you implement?

Sources & further reading

Last verified against MS Learn: April 28, 2026.

Part 1 — Table partitioning​

What it is (and isn't)​

Architecture: function + scheme + table​

Boundary semantics — RANGE LEFT vs RANGE RIGHT​

Step-by-step build​

The sliding window pattern​

Inspect what you've built​

Switch-out rules cheat-sheet​

Part 2 — Database sharding (recommend)​

What sharding is​

When to recommend sharding​

Sharding patterns​

Elastic Database Tools — components​

Recommended workflow on the exam​

Part 3 — Data compression​

Compression types​

Apply compression​

Estimating savings before you commit​

Decision matrix​

Bringing it together — partitioned + compressed warehouse table​

Anti-Patterns​

Migration Between Partitioning / Compression Strategies​

Real Scenarios​

Flashcards​

Practice quiz​

Sources & further reading​