1 · Exadata X9M Product Overview
These are original revision notes for the Exadata X9M Product Overview lesson. They describe the hardware and the smart software that make Exadata the engine behind Oracle Exadata Database Service on Oracle Database@Azure, in our own words rather than reproducing the recording.
Core message
Exadata X9M is a scale-out database platform, not a single server. It is built from two tiers — database servers and intelligent storage servers — joined by a fast internal RoCE fabric. What makes it different from ordinary hardware is the Exadata System Software: the storage tier is not a passive disk array, it actively offloads database work, places data across memory and flash automatically, and prioritises the most important I/O. The result is a single platform that is tuned at the same time for the fastest OLTP, the fastest analytics, and the best consolidation of many databases. On Oracle Database@Azure, X9M is the only shape offered, so every deployment inherits these characteristics.
Two server tiers
Everything in Exadata is organised around the split between compute and storage:
- Database servers run the Oracle software — the instances, the clustering, and the storage management layer.
- Storage servers hold the physical database files and run their own intelligent storage software.
- The two tiers talk over an internal RoCE network (RDMA over Converged Ethernet), not a general-purpose LAN or a traditional SAN.
This separation is what lets you grow compute and storage independently, and it is what allows the storage tier to do real work on the database's behalf.
Database servers
Each database server runs three things:
- Oracle Clusterware — clusters the database servers together so they act as one system.
- Automatic Storage Management (ASM) — the logical volume manager that organises the storage presented by the storage servers.
- Database software — the Oracle database instances themselves.
Hardware per database server on X9M:
- Two 32-core Intel Ice Lake processors → 64 cores per server, a 33% core increase over X8M.
- Memory expandable up to 2 TB.
- An 8-socket database server option for large symmetric multiprocessing (SMP) workloads.
Storage servers and tiered storage
The physical structure of the database lives in the storage servers, and the Exadata System Software decides where each piece of data sits. There are three media tiers inside a storage server, from fastest to largest:
| Tier | Media | Role |
|---|---|---|
| Hottest | Persistent Memory (PMEM) — 1.5 TB per storage server | Lowest-latency random I/O for OLTP |
| Warm | NVMe flash | High-bandwidth flash cache and Extreme Flash storage |
| Coldest | Hard disk (HDD) | High-capacity, lower-cost bulk storage |
Each storage server has two 16-core Intel Ice Lake processors, and data is automatically placed across PMEM, flash, and disk — hot data is promoted toward memory and flash without manual tiering.
You also choose a storage server type when you size the system:
- High-Capacity (HC) — flash-cached hard disk for the best capacity-per-cost.
- Extreme Flash (EF) — all-flash for the highest bandwidth.
- Extended (XT) — low-cost capacity for cooler or archival data.
Why scale-out storage beats shared flash
A natural question is: why not just buy an all-flash array? The lesson's answer is about performance, not capacity:
- All-flash arrays let many servers share the flash capacity, but they cannot share the flash performance — the storage network becomes the bottleneck. As much as 96% of the flash performance can be lost behind that network.
- A single NVMe flash drive delivers around 11 GB/sec, which is already faster than a fast SAN link at roughly 4 GB/sec (a 32 Gb SAN). Adding more flash behind a SAN does not help once the link is saturated.
- Exadata avoids this by moving the compute to the data: it offloads query processing into the storage servers so only the results travel back. That only works if a single vendor owns the full stack — database, software, and storage.
- As a result, Exadata Smart Storage delivers flash bandwidth that approaches the aggregate DRAM bandwidth of the database servers, and performance scales as flash is added rather than flattening out.
Fastest OLTP
OLTP is about huge numbers of small, random reads and writes with the lowest possible latency. X9M targets this with:
- Scale-out storage + RDMA + Persistent Memory + NVMe flash working together.
- Direct PMEM I/O over RDMA — the database instance reads and writes directly to Persistent Memory in the storage servers, bypassing software overhead for ultra-low-latency random I/O. PMEM also acts as an automatic commit accelerator.
- The RoCE interconnect removes the cluster-coordination bottleneck that limits ordinary clustered databases.
- Physical-level failure detection — failing components are detected in hardware rather than waiting for a software timeout, so failover is faster.
X9M OLTP numbers worth remembering:
- 27.6 million read IOPS (8K I/Os) — 70% more than X8M.
- Under 19 µs OLTP I/O latency through PMEM.
- PCIe 4.0 dual-port active-active 100 Gb RoCE cards, and 33% more cores (32-core Ice Lake).
Fastest analytics
Analytics is about scanning large volumes of data quickly. Exadata pushes the work down into storage instead of dragging raw data up to the database servers:
- Smart Scan offloads predicate, join, and column filtering to the storage servers, so only the rows and columns that matter come back.
- Automatic data placement — data scanned from High-Capacity disk is copied into flash so the next SQL statement runs against flash.
- Storage Index tracks which regions hold which column values, so Smart Scan only reads the regions that can possibly match.
- Hybrid Columnar Compression typically reaches 10:1 (100 GB → 10 GB), shrinking both storage and scan time.
- In-Memory Columnar format is kept in flash and, with the Database In-Memory option (introduced in Oracle Database 12.1.0.2), in the database too — eliminating the need for separate analytic indexes.
X9M analytics numbers:
- 1 TB/sec scan throughput.
- 87% faster analytic scan and smart flash cache throughput (new PCIe flash cards), with 80% faster RoCE network throughput from PCIe 4.0.
- 33% more CPU cores → more parallel execution servers, with automatic raw-to-columnar conversion.
Best consolidation
Because one platform can run OLTP and analytics well, it is also an excellent place to consolidate many databases — but only if one workload cannot starve another:
- Performance headroom lets Exadata host roughly 2× the consolidation density of comparable systems.
- Exadata automatically prioritises critical I/O over the interconnect and storage network — lock messages, Cache Fusion traffic, and logging are served first, and OLTP I/O is served ahead of analytic or batch I/O.
- The I/O Resource Manager lets you define a plan that controls I/O by workload importance.
- Workloads can be isolated at three levels: the server level, the virtual machine level, and the container database + pluggable database (CDB/PDB) level.
Elastic scale with zero downtime
You start small and grow online, one server at a time, with no downtime. Standard published X9M configurations:
| Configuration | Database cores | Memory (DRAM) | Usable disk¹ | All-flash option¹ |
|---|---|---|---|---|
| Eighth rack | 64 | 768 GB | 96 TB | — |
| Quarter rack | 128 | 1 TB | 192 TB | 44 TB |
| Full rack | 512 | 4 TB | 898 TB | 206 TB |
¹ Usable figures are shown after allocating space for high-availability mirroring; persistent memory is added on top for performance acceleration.
Beyond the named racks you use elastic configuration — add one compute server or one storage server at a time:
- Standard balanced configuration: 8 database servers + 14 storage servers.
- Maximise storage: 2 database servers + 18 storage servers.
- Maximise compute: 19 database servers + 3 storage servers.
At the hardware maximum, a single rack reaches 1,216 database cores and 38 TB of memory for compute, and 576 storage cores, 27 TB of PMEM, 920 TB of flash, and 3.8 PB of raw disk for storage. Multiple racks can be joined, but only RoCE-to-RoCE (X9M↔X9M or X9M↔X8M) — a RoCE system cannot be interconnected with an older InfiniBand-based system.
The internal network: RDMA over RoCE
The fabric between the two tiers is not a general-purpose LAN — it is built on RDMA (Remote Direct Memory Access) carried over RoCE (RDMA over Converged Ethernet), and it is the reason the storage tier can behave like an extension of database memory.
- RDMA moves large transfers with high throughput and low CPU, because it reads remote memory directly and bypasses the operating system and the I/O software stack.
- The Direct-to-Wire Protocol makes inter-node OLTP cluster messaging about 3× faster.
- Smart Fusion Block Transfer removes the log write that would normally happen when a block moves between nodes.
- RoCE delivers that RDMA speed and reliability on Ethernet at 100 Gb/sec — about 2.5× faster than 40 Gb InfiniBand — with zero packet-loss messaging and prioritisation of critical database messages, carrying OLTP, Cache Fusion, and logging on one fabric.
Exadata was the world's first RoCE-based database machine (introduced on X8M, standard on X9M). On Oracle Database@Azure this fabric is internal to the Exadata rack and is separate from the Azure VNet that carries client traffic.
Persistent Memory acceleration
Persistent Memory (PMEM) — Intel Optane — sits between DRAM and flash in the storage tier (1.5 TB per storage server) and fills the latency gap that flash alone leaves. Exadata uses it in three ways, all transparent to the application:
- Persistent Memory Data Accelerator — the storage servers place PMEM in front of flash, so the database reads remote PMEM over RDMA instead of doing I/O. The PMEM is auto-tiered and shared across all databases on the rack.
- Automatic Commit Accelerator — the database issues one-way RDMA writes to PMEM on multiple storage servers, bypassing network and I/O software, interrupts, and context switches; redo is flushed to flash and disk in the background. The result is up to 8× faster log writes.
- Mirrored for fault tolerance — PMEM is mirrored automatically across storage servers, and Exadata algorithms keep its data consistent through failures.
Hot data lives in PMEM, warm data in flash, and cold data on disk — placed automatically across the storage tier.
Maximum Availability Architecture (MAA)
Exadata is engineered for Oracle Maximum Availability Architecture at three scopes. At the product-capability level for this lesson:
- Within the Exadata rack — Full Fault Tolerance. Redundant hardware (servers, disks, flash, network, power) and redundant software (active clusters, disk and flash mirroring, redo-based replication with data-consistency checking) keep the system running through component failures.
- Within a site — Local Data Guard. A local standby provides HA failover over the LAN.
- Across sites — Data Guard for DR. A remote standby over the WAN supports disaster recovery plus online patching, reconfiguration, and expansion.
Exadata also adds fastest RAC instance/node failure recovery, RMAN backup offload to the storage servers, deep ASM integration, fastest Data Guard redo apply, and complete failure testing with the shortest brownouts. The HA and DR detail — Data Guard modes, failover, and backup strategy — is covered in Module 13 (High Availability and Disaster Recovery); here it is enough to know Exadata provides the engineered foundation those features run on.
Industry-hardened full-stack security
Security is built into both the machine and the database, so the customer does not assemble it from parts:
- Machine security — AIDE intrusion detection, regular security scans, FIPS 140-2 and PCI-DSS alignment, data and network encryption, a minimal Linux distribution, secure erase, system lockdown, live kernel patching, and Secure RDMA Fabric Isolation so tenants cannot see each other's interconnect traffic.
- Oracle Database Maximum Security Architecture — Data Safe, Identity Management, Transparent Data Encryption, Network Encryption, Database Vault, Audit Vault, Key Vault, Database Firewall, Virtual Private Database, Label Security, Data Redaction, and Data Masking & Subsetting.
On Oracle Database@Azure these controls apply to the Exadata service the customer consumes — the platform is delivered already hardened.
Software intelligence that persists and accelerates
Beyond the headline OLTP and analytics numbers, the X9M storage software adds optimisations that survive restarts and cut CPU. Several are automatic and transparent from Exadata 21.2:
| Capability | What it does | Why it matters |
|---|---|---|
| Persistent Storage Index | Storage Indexes are persisted to M.2 SSD instead of only living in storage-server memory | No rebuild after planned or unplanned downtime — consistent performance before and after a Storage Software restart |
| Persistent Columnar Cache | The In-Memory columnar format kept in flash is persisted and restored on restart | Reduces database CPU and increases effective In-Memory capacity, with no warm-up penalty |
| Smart Scan Metadata Sharing | Storage servers share Smart Scan metadata | More parallel-query performance and scalability; works with JSON/XML, ML models, external tables, data mining, and large Bloom filters |
| ACFS I/O caching in Flash Cache | ACFS writes go to Cell Flash Cache and small sequential writes are coalesced into 1 MB writes | Up to 7× faster ACFS reads (e.g. GoldenGate trail files); honours existing IORM plans |
| Smart Scan Fast Decryption | Processor-cache-friendly filtering and projection on encrypted data | 2.4× faster decryption for Smart Scans on X9M |
| IORM Cluster Plan | Allocates storage-server resources by Grid Infrastructure cluster (shares and limits) | Consolidation at the cluster level — e.g. Sales 75% / Finance 25% |
All of these work with all supported Oracle Database versions.
Smarter system management
X9M also reduces the operational effort of running the platform:
- Oracle ASR support for RoCE switches — switch alerts propagate to Automatic Service Request (ASR) and are configured with a single command, for example
python bootflash:scripts/asr/bin/asr set endpoint="ASR_endpoint"; the same applies to the Management Switch, which also receives software updates. - Faster upgrades with ILOM pre-staging — firmware is staged in parallel ahead of time, saving over an hour per full-rack upgrade.
- Enhanced database-server alerting — Oracle Database and Grid Infrastructure incidents are visible through the
LIST ALERTHISTORYcommand ofDBMCLIfrom a single consolidated endpoint.
Exadata advantages increase every year
X9M is the current point on a long line of engineered generations — each one added hardware and software innovation, and X9M is the only shape offered on Oracle Database@Azure, so customers inherit the whole accumulated stack.
Customer value
- One platform, three workloads — the same system serves OLTP, analytics, and mixed consolidation, so customers do not need separate stacks.
- Performance that scales — adding storage adds bandwidth, instead of hitting a shared-network ceiling like an all-flash SAN.
- Independent, online growth — compute and storage scale separately with no downtime, so capacity tracks the workload.
- Predictable consolidation — resource management and isolation let many databases share the platform without noisy-neighbour surprises.
- Engineered availability and security — MAA fault tolerance and full-stack hardening (TDE, Database Vault, Key Vault, Secure RDMA Fabric Isolation) come built in, not bolted on.
- Performance that survives restarts — persistent Storage Index and Columnar Cache mean consistent performance immediately after a Storage Software restart, with no warm-up penalty.
- Consistent on Oracle Database@Azure — because only the X9M shape is offered, every deployment gets the same engineered capabilities.
Risks and constraints to remember
- X9M is the only supported shape on Oracle Database@Azure — other Exadata shapes are not offered.
- The minimum Oracle Database@Azure footprint is a quarter rack — two database servers and three storage servers.
- On the Azure service, infrastructure scales elastically up to documented limits of 32 database servers and 64 storage servers, adjusted online without downtime; plan compute and storage growth separately.
- Multi-rack interconnect is RoCE-only — you cannot mix RoCE (X9M/X8M) with InfiniBand-based generations.
- HA/DR is a shared design point, not automatic — MAA gives the foundation, but Local/Remote Data Guard still has to be configured; the detail lives in Module 13.
- Some persistence and transparency features are version-gated — several (Smart Scan Metadata Sharing, ACFS flash caching) are automatic only from Exadata 21.2.
- Capacity numbers are usable after HA mirroring — size against usable, not raw, figures.
- Database In-Memory and Hybrid Columnar Compression are powerful but are licensed/feature-dependent — confirm what the customer's edition includes before promising results.
Terms to remember
- RoCE (RDMA over Converged Ethernet) — the high-speed internal fabric (100 Gb/s on X9M) that lets database servers read storage-server memory directly with very low latency.
- PMEM (Persistent Memory) — byte-addressable memory in the storage servers (1.5 TB each) used as the hottest storage tier and a commit accelerator.
- Smart Scan — storage-side offload of filtering and projection so the database receives only matching rows and columns.
- Storage Index — in-memory metadata in the storage servers that lets Smart Scan skip regions with no matching data.
- Hybrid Columnar Compression (HCC) — Exadata columnar compression, typically around 10:1.
- I/O Resource Manager (IORM) — the policy engine that prioritises I/O by workload importance for safe consolidation; an IORM Cluster Plan extends this to allocate storage resources per Grid Infrastructure cluster.
- CDB / PDB — container database and pluggable database, the isolation boundary for multitenant consolidation.
- RDMA (Remote Direct Memory Access) — direct reads of remote memory that bypass the OS and I/O stack; the basis of the RoCE fabric.
- Direct-to-Wire Protocol — inter-node OLTP messaging path that is ~3× faster.
- Smart Fusion Block Transfer — removes the log write when a block moves between nodes.
- Automatic Commit Accelerator — one-way RDMA writes of redo to PMEM on multiple storage servers for up to 8× faster log writes.
- MAA (Maximum Availability Architecture) — Oracle's reference HA/DR design: full fault tolerance in the rack, Local Data Guard in a site, Data Guard for DR across sites (detail in Module 13).
- Persistent Storage Index / Persistent Columnar Cache — storage-software state persisted to SSD/flash so it survives restarts with no rebuild or warm-up.
- DBMCLI
LIST ALERTHISTORY— consolidated endpoint surfacing Oracle Database and Grid Infrastructure incidents for database-server alerting.
"When customers ask why Exadata is different from 'just a fast all-flash array', I keep it to one idea: Exadata moves the work to where the data lives. A normal storage array shares its flash capacity across servers but not its performance — the storage network caps you, and most of the flash speed is wasted. Exadata's storage servers are intelligent: they filter and scan data for the database, so only the answers travel back over the RoCE fabric. That is why the same box can give you 27.6 million OLTP IOPS, terabyte-per-second analytic scans, and safe consolidation of dozens of databases — and on Oracle Database@Azure you get exactly that X9M engine, scaling compute and storage online as you grow."