Exadata Networking Explained: InfiniBand vs RoCE | ExaGuru
Exadata Architecture Deep Dive · 2026 Edition

Exadata Networking Explained: Why Oracle Replaced InfiniBand with RoCE

For years, InfiniBand defined high-performance Oracle Exadata networking. Then Oracle began moving toward RoCE. The move boils down to years of advances in Ethernet, cloud infrastructure, and enterprise data center design — not a simple swap of cables.

Series: Exadata Engineering
Read: ~20 min
Audience: DBAs, Architects
Level: Intermediate → Advanced

01 · Introduction

If you've worked with Oracle Exadata for a while, you've probably heard InfiniBand described as one of the technologies that made Exadata so fast. It delivered ultra-low latency, high bandwidth, and Remote Direct Memory Access (RDMA), allowing Compute Nodes and Storage Cells to exchange data far faster than traditional Ethernet networks.

Modern Oracle Exadata systems are increasingly adopting RoCE (RDMA over Converged Ethernet) instead of InfiniBand. At first glance, this might look like Oracle simply replacing one network technology with another. The move boils down to years of advances in Ethernet networking, cloud infrastructure, and enterprise data center design.

Understanding this evolution isn't just about networking — it explains how Oracle continues improving Smart Scan, Storage Cell communication, database scalability, and overall Exadata performance. Here is how it works under the hood, and why networking is one of the most overlooked reasons Oracle Exadata performs so well.

Q: Why did Oracle replace InfiniBand with RoCE in Exadata?
A: Oracle replaced InfiniBand with RoCE starting with Exadata X8M to leverage commodity cloud-scale Ethernet infrastructure while maintaining sub-microsecond RDMA performance. RoCE integrates with open OCI leaf-spine topologies without specialized Subnet Managers or InfiniBand switches.


02 · Why Is Networking So Important Inside Oracle Exadata?

To appreciate why Exadata networking is revolutionary, we must first examine how a traditional database server interacts with its storage.

Traditional Database Networking vs. Exadata

In a typical commodity database architecture, the database server (compute) communicates with a Storage Area Network (SAN) or Network Attached Storage (NAS) via protocols like Fibre Channel or iSCSI. When the database engine requests blocks of data:

  1. The database issues an I/O request via operating system system calls.
  2. The request travels down the OS storage stack, crosses the storage network, and hits the controller.
  3. The storage array retrieves raw blocks and sends them back across the network.
  4. The database server's CPU intercepts incoming packets, processes TCP/IP or Fibre Channel protocol overhead, copies data into kernel space, and finally copies it into user space (the Oracle SGA).

During this entire lifecycle, the network is merely a passive pipe. It does not understand database blocks, SQL queries, or execution plans. If a query requires scanning a 10 TB table to find a single matching record, all 10 TB of data must travel across that network pipe to the database server's memory — creating severe bottlenecks, spiking CPU utilization, and inducing massive latency.

The Exadata Paradigm Shift

Oracle Exadata splits processing between Compute Nodes (Database Servers) and Storage Cells (Storage Servers). Instead of shipping raw, unmanaged blocks over a passive network, Exadata executes Smart Scans. The Compute Node offloads SQL predicate evaluation directly to the Storage Cells. The Storage Cells scan local NVMe flash or disks, filter out irrelevant rows and columns, and return only the relevant rows to the Compute Node.

Oracle Exadata Smart Scan Offloading vs Traditional SAN Storage Architecture Diagram

Figure 1 · Smart Scan offloading vs. traditional SAN block shipping

Because of this tightly coupled co-processing, internal network traffic between Compute Nodes and Storage Cells is incredibly chatty and dynamic. It handles:

  • Block Requests: Traditional single-block reads for OLTP operations.
  • Smart Scan Offloading commands: Metadata, predicate descriptions, and column selections sent to the storage cell.
  • Inter-node Cache Fusion traffic: Real Application Clusters (RAC) cache synchronization between Compute Nodes.

If the network introduces even microseconds of serialization delay or packet drop anomalies, the entire database engine grinds to a halt. In Exadata, networking is part of the database architecture — not just infrastructure. It directly impacts how locks are managed via RAC, how quickly redo logs are flushed, and how efficiently parallel query processes coordinate.


03 · What Is RDMA, and Why Does Exadata Rely on It?

At the heart of Exadata's high-performance networking fabric lies Remote Direct Memory Access (RDMA) — a hardware-level technology that allows a network adapter to read and write directly to a remote computer's memory without involving either system's operating system or CPU.

Understanding RDMA

In a standard network environment, if Server A wants to read data from the memory of Server B, it uses a standard TCP/IP stream. Server B's CPU must stop what it is doing, handle an interrupt from the Network Interface Card (NIC), copy data from the network buffer into the operating system's kernel memory, and then copy it into the application's memory space. This is known as a context switch and involves multiple copy operations.

RDMA changes this entirely. To pull this off, RDMA relies on a triple threat: Kernel Bypass to cut out the OS middleman, Zero-Copy Communication to eliminate intermediate buffering, and total CPU Offloading so host CPUs on both Compute Nodes and Storage Cells are free to process SQL transactions, manage execution plans, and run database logic. Latency drops from milliseconds or hundreds of microseconds (standard TCP/IP) down to single-digit microseconds.

When a database instance on an Exadata Compute Node needs a block that resides in the Smart Flash Cache of a Storage Cell, it doesn't send an asynchronous I/O request that waits for the storage operating system (cellsrv) to schedule a thread. Instead, it uses RDMA to read the block directly from the remote storage cell's memory.

Traditional TCP/IP Network Stack vs RDMA Kernel Bypass Zero Copy Communication Path

Figure 2 · Traditional TCP/IP vs. RDMA communication path


04 · Why Was InfiniBand So Successful in Oracle Exadata?

When Oracle introduced Exadata V2, it selected InfiniBand as its foundational internal fabric. At the time, standard data center networks relied on 1 Gbps or 10 Gbps Ethernet using the standard TCP/IP stack.

Why Oracle Originally Chose InfiniBand

InfiniBand was built from the ground up for High-Performance Computing (HPC) clusters. It offered native, hardware-level support for RDMA, out-of-the-box credit-based flow control (which guarantees a lossless fabric where packets are never dropped due to buffer overflows), and substantial bandwidth — starting at 40 Gbps QDR in early versions and advancing to 100 Gbps EDR in later Exadata iterations.

Impact on Smart Scan and Cell Offloading

InfiniBand was the technology that allowed Exadata to scale horizontally. When a parallel query split a job across 32 parallel execution servers on 4 compute nodes, intra-cluster communication occurred over InfiniBand via a customized protocol called Reliable Datagram Sockets (RDS).

RDS bypassed the heavy Linux network subsystem, allowing Exadata to unlock the full potential of Cell Offloading. If the storage cells filtered down a massive data set, InfiniBand was fast enough to stream those filtered result sets back to the compute nodes without causing congestion. For a decade, InfiniBand was an architectural advantage for Oracle Exadata.


05 · Why Did Oracle Begin Replacing InfiniBand with RoCE?

If InfiniBand was so fast and successful, why did Oracle begin replacing it starting with the Exadata X8M generation? The answer lies in the massive, industry-wide evolution of commodity Ethernet technologies and cloud-scale infrastructure deployment needs.

The Evolution of Ethernet

For decades, Ethernet was criticized for being a lossy network protocol prone to packet collisions and drops under high utilization. Hyperscale cloud vendors and enterprise hardware consortiums invested heavily in advancing Ethernet technology. Innovations in Converged Ethernet introduced hardware-level priority flow control, bringing the same lossless guarantees to Ethernet that were once exclusive to InfiniBand.

At the same time, Ethernet bandwidth skyrocketed to 100 Gbps, 200 Gbps, and 400 Gbps at a commodity price point that InfiniBand struggled to match in terms of scale-out manufacturing economies.

Cloud Integration and Scalability

Oracle's strategic direction shifted heavily toward the cloud — specifically Oracle Cloud Infrastructure (OCI) and Exadata Cloud@Customer. Managing massive cloud data centers that mixed InfiniBand fabrics for databases with traditional Ethernet fabrics for client compute created administrative and architectural complexity.

InfiniBand requires specialized host channel adapters (HCAs), unique cables, and specialized InfiniBand switches managed via a Subnet Manager. By transitioning Exadata to RoCE (RDMA over Converged Ethernet), Oracle unified its network fabric. Exadata could now use the same physical Ethernet leaf-spine switch architectures deployed across modern cloud data centers.

Oracle Cloud Infrastructure OCI Leaf-Spine Ethernet Switch Topology with RoCE

Figure 3 · OCI leaf-spine Ethernet topology with RoCE integration

Oracle did not abandon performance; they modernized the underlying layer. RoCE delivers identical — and in many cases superior — sub-microsecond latencies and higher bandwidth capabilities than InfiniBand, while utilizing standard physical Ethernet connections.


06 · How Does RoCE Reduce Latency Compared to Traditional Ethernet?

To understand how RoCE achieves its low latency, we must contrast its architecture against standard Ethernet processing.

The Anatomy of a RoCE Packet

RoCE works by taking an InfiniBand network layer packet, removing the InfiniBand transport/link headers, and encapsulating the raw InfiniBand RDMA payload directly inside a standard Ethernet frame with an IP and UDP header.

Because it uses standard UDP/IP routing headers, RoCE packets pass through standard corporate enterprise switches. When the packet hits a RoCE-capable network interface card (known as an RNIC), the card strips the UDP/IP wrapper in hardware and processes the internal RDMA command directly into memory.

Congestion Management: The Secret to Lossless Ethernet

Traditional Ethernet handles network saturation by dropping packets and relying on the TCP layer to notice the drop and retransmit the data. This behavior is disastrous for database clusters, where dropped packets cause transaction timeouts and RAC node evictions.

RoCE prevents this through two critical standards:

Priority Flow Control (PFC)

Allows the network switch to pause transmission on a specific internal traffic class (e.g., database cache fusion traffic) without stopping other non-critical data traffic on the same link.

Explicit Congestion Notification (ECN)

Allows switches to mark packets when buffers are getting full. The receiving RNIC reads this mark and signals the sender to throttle back its transmission rate before any packets are dropped.

Production Comparison Scenario

Picture a massive end-of-month batch window hitting your cluster. On a traditional Ethernet fabric, a minor buffer overflow causes a packet drop. The TCP stack stalls, triggers a retransmit timeout, and suddenly your application experiences a painful 100ms latency spike.

In contrast, an Exadata RoCE fabric handles the same crunch gracefully: the moment a switch feels the squeeze, it marks an ECN flag. The sender throttles back instantly — meaning zero dropped packets and seamless sub-microsecond execution.

Q: How does RoCE reduce database latency?
A: RoCE bypasses much of the traditional TCP/IP networking stack via kernel bypass and zero-copy RDMA. Under congestion, Priority Flow Control (PFC) and Explicit Congestion Notification (ECN) prevent packet drops that would otherwise trigger 100ms+ TCP retransmit spikes in database clusters.


07 · How Does Networking Improve Smart Scan Performance?

Many DBAs believe Smart Scan performance is solely a function of CPU processing inside the storage cells and NVMe flash speeds. While those are vital components, the network fabric is what binds them together.

The Lifecycle of an Exadata Smart Scan Query

Let's trace how an enterprise SQL statement leverages the network fabric:

smart_scan_example.sql Exadata · iDB Offload
SELECT   customer_id,
         SUM(transaction_amount) AS total_amount
FROM     enterprise_sales
WHERE    region       = 'NORTH'
AND      fiscal_year  = 2026
GROUP BY customer_id;
-- Predicate + column projection offloaded to Storage Cells via RoCE
  1. The user issues the query to the Compute Node.
  2. The Oracle Database kernel parses the query, identifies that enterprise_sales is stored on Exadata storage, and formulates an iDB (intelligent Database) protocol command metadata block.
  3. The Compute Node pushes this iDB block over the RoCE fabric using RDMA bypass protocols directly to the memory space of the target Storage Cells.
  4. The Storage Cells receive the request instantly. Their local cellsrv background processes read blocks from NVMe storage, evaluate Storage Indexes to skip unneeded regions, and apply the filter (region = 'NORTH').
  5. Instead of sending back thousands of 8KB database blocks filled with irrelevant data from other regions, the storage cells construct a dense stream of matching columns (customer_id, transaction_amount).
  6. This filtered data stream is pushed back over the RoCE network using highly optimized RDMA transfers directly into the SGA memory of the Compute Node.

Because the network transport layer utilizes RoCE, the transit time for the offloaded request and the returned data stream approaches the physical limits of wire speed. The compute node spends zero CPU cycles managing network stack interrupts, allowing it to begin aggregating the incoming data immediately.

Exadata Smart Scan SQL Query Predicate Offloading Data Flow Over RoCE

Figure 4 · Smart Scan data flow over RoCE — from iDB command to filtered result stream


08 · How Does RoCE Support Modern Oracle Exadata Workloads?

As organizations consolidate disparate systems into unified database clouds, internal networking requirements grow exponentially. RoCE provides the performance foundation for these demanding environments.

OLTP and Cache Fusion Performance

In multi-node Real Application Clusters (RAC) running intensive OLTP workloads, blocks are frequently modified across different instances. Oracle's Cache Fusion technology moves these blocks across the cluster fabric. With RoCE, the latency for an inter-node block ping drops into the low single-digit microsecond range — eliminating the traditional "RAC scaling tax" and allowing applications to scale near-linearly as additional compute nodes are added.

AI Workloads and Oracle Database 23ai

Modern applications rely heavily on Vector Searches and Large Language Models (LLMs) integrated directly into the database engine via Oracle Database 23ai. Vector embeddings require significant memory and fast pipeline execution. High-throughput RoCE networks enable fast vector processing pipelines between parallel query layers and storage layers, ensuring that unstructured AI data scans don't stall waiting for infrastructure response times.

Exadata Cloud@Customer & Exadata Cloud Infrastructure

In multi-tenant cloud environments, isolation and bandwidth guarantees are essential. RoCE's ability to map network traffic to distinct Virtual Local Area Networks (VLANs) with strict Quality of Service (QOS) parameters ensures that a massive data warehouse backup job in one container database does not impact the ultra-low latency OLTP transactions of a mission-critical financial database running on the same Exadata infrastructure.

Oracle Exadata RoCE Multi Node Network Architecture Layout

Figure 5 · Oracle Exadata RoCE network architecture — compute, storage, and fabric


09 · Common Myths About Oracle Exadata Networking

Myth 1: Networking has little impact on database performance.

Reality: Standard database networking relies heavily on host CPUs to process the network stack. In highly concurrent systems, network latency and context switching can consume a significant amount of database wait time. Exadata's use of RDMA over RoCE unloads this work entirely, moving network processing onto dedicated hardware and directly improving SQL response times.

Myth 2: RoCE is simply faster Ethernet.

Reality: Calling RoCE "just faster Ethernet" overlooks its key features. While it uses physical Ethernet cables, the RoCE network protocol completely bypasses the standard TCP/IP stack. It implements hardware-controlled congestion management (PFC/ECN) to convert a standard data link into a lossless cluster fabric capable of direct remote memory access.

Myth 3: InfiniBand was discontinued because it was slow.

Reality: InfiniBand remains incredibly fast. Oracle shifted to RoCE to embrace open, industry-wide standards and unify their hardware engineering lines across global OCI data centers. This change allows Exadata to easily scale along with the broader developments in corporate Ethernet infrastructures.

Myth 4: RDMA only benefits storage reads.

Reality: RDMA is heavily used for inter-node communications within RAC clusters. It handles high-speed cache synchronization (Cache Fusion) and coordinates transactional lock management across multiple compute nodes, which helps prevent latency spikes during heavy OLTP workloads.

Myth 5: Standard corporate Ethernet provides identical performance.

Reality: Standard corporate Ethernet networks lack the Priority Flow Control (PFC) and kernel bypass mechanisms that define RoCE. Without these components, a standard network will drop packets under heavy loads, causing high database latency and query stalls.

Myth 6: Smart Scan works independently of networking.

Reality: Smart Scan relies entirely on a fast, low-latency network connection. Without the specialized iDB protocol running over a high-bandwidth RDMA fabric, the database cannot offload queries or return filtered streams quickly enough to prevent system bottlenecks.


10 · Production Best Practices for Exadata Networking

  1. Maintain Firmware Alignment Across SwitchesExadata architectures rely on strict software-hardware alignment. Apply recommended foundational system patches via Patch Manager (patchmgr) regularly. This keeps RoCE switch firmware aligned with RNIC driver software levels on compute and storage nodes.
  2. Monitor Network Health via CLI ToolsDo not treat the internal network fabric as an unmonitored black box. Run rocelink-check or ibstatus (depending on your generational architecture) to verify link state integrity. Use cellcli -e "LIST METRICCURRENT WHERE name LIKE 'CL_.*'" to examine real-time cluster inter-node transmission statistics on storage nodes.
  3. Track Crucial Database Network Wait EventsWhen troubleshooting performance regressions in AWR reports, monitor: gc cr block receive / gc current block receive (cluster interconnect congestion), cell single block physical read (single-block fetch speed over the fabric), and cell smart table scan (throughput drops across RoCE switches).
  4. Implement Strict Physical Cable ManagementRoCE fabrics operate at 100 Gbps and higher using specialized QSFP copper or fiber optic cabling. Ensure cables do not exceed specified bend-radius limits. A crimped or damaged cable can lead to CRC packet transmission errors, forcing the network adapter to fallback to slower transfer speeds.
  5. Validate Configuration Changes via ExachkBefore making infrastructure changes, run Oracle Exachk. This automated health check verifies that RoCE switch parameters, MTU sizes (typically 9000-byte Jumbo Frames for maximum throughput), and partition definitions conform to official Oracle support standards.

11 · Comparison Table: InfiniBand vs. RoCE Inside Exadata

Feature InfiniBand (X2–X8-2) RoCE (X8M, X9M, X10M+)
Physical Medium InfiniBand QSFP cables Standard Ethernet QSFP copper/fiber
RDMA Support Native InfiniBand RDMA RDMA over Converged Ethernet (RoCE v2)
Typical Bandwidth 40 Gbps QDR → 100 Gbps EDR 100 Gbps → 200 Gbps → 400 Gbps
Lossless Fabric Credit-based flow control (built-in) PFC + ECN (Priority Flow Control + Explicit Congestion Notification)
Switch Management InfiniBand Subnet Manager required Standard Ethernet switch management
Cloud Integration Separate fabric from OCI Ethernet Unified with OCI leaf-spine Ethernet topology
Host Adapters InfiniBand HCAs RoCE-capable RNICs
Custom Protocol RDS (Reliable Datagram Sockets) iDB over RDMA via libcell
Typical Latency Single-digit microseconds Single-digit microseconds (often lower end-to-end)
Upgrade Path Not upgradeable to RoCE in-place Requires new hardware generation (X8M+)

Table 1 · InfiniBand vs. RoCE inside Oracle Exadata — engineering comparison


12 · Frequently Asked Questions

Can I upgrade an older InfiniBand Exadata machine to RoCE?

No. The shift from InfiniBand to RoCE requires completely different internal hardware components, including different Network Interface Cards (RNICs), matching backend motherboards, and entirely different internal physical switch arrays. You cannot upgrade an InfiniBand-based Exadata system to RoCE through software updates alone.

Does RoCE mean that my Exadata database is now exposed to the public corporate network?

No. The internal RoCE fabric remains fully isolated on its own private network inside the Exadata rack. Client connections and corporate applications still connect through separate, dedicated client networks via the compute node interfaces.

What does "Lossless Ethernet" mean in relation to RoCE?

Lossless Ethernet refers to an Ethernet network configured with advanced features like Priority Flow Control (PFC) and Explicit Congestion Notification (ECN). These features prevent network switches from dropping packets when they become overloaded, providing the stable, reliable data flow required for direct memory access protocols.

Does Oracle Database 19c support RoCE networking fabrics?

Yes. Oracle Database 19c fully supports RoCE networking fabrics on modern Exadata platforms (such as the X8M, X9M, X10M, and newer generations), ensuring compatibility across enterprise environments.

How do Jumbo Frames affect Exadata RoCE performance?

Exadata configures its RoCE networks to use Jumbo Frames with a 9000-byte Maximum Transmission Unit (MTU). This larger packet size reduces encapsulation overhead and lowers CPU utilization on the network interface cards, allowing them to move large database blocks across the network more efficiently.

Is specialized training required for an Oracle DBA to manage a RoCE fabric?

For day-to-day database administration, your tasks remain largely the same. Infrastructure team members and cloud architects should familiarize themselves with Ethernet management tools, VLAN configurations, and RoCE metrics within Oracle Enterprise Manager to effectively monitor the network fabric.

Does the move to RoCE alter how Oracle RAC Cache Fusion functions?

The core concept of Cache Fusion remains unchanged, but its underlying transport mechanism is faster. By replacing the older InfiniBand layer with RoCE, Cache Fusion can transfer data blocks between compute instances with lower latency and higher reliability.

What happens if a RoCE switch fails within an Exadata rack?

Exadata racks are engineered with full hardware redundancy, featuring dual active-passive RoCE switches and dual-ported network interface cards on every node. If a switch fails, network traffic automatically shifts to the redundant path without interrupting database operations.


13 · The Short Version — 8 Things Every DBA Should Know

  1. Networking is equally critical as CPUs and storageOracle Exadata performance depends on more than CPUs and storage — its internal networking architecture is equally critical.
  2. RDMA enables direct memory exchangeRDMA enables Compute Nodes and Storage Cells to exchange data directly through memory, reducing CPU overhead and latency.
  3. InfiniBand powered early Exadata generationsInfiniBand provided the high-speed, low-latency foundation that powered earlier generations of Oracle Exadata.
  4. RoCE brings RDMA to modern EthernetRoCE brings RDMA capabilities to modern Ethernet networks, combining enterprise flexibility with exceptional performance.
  5. RoCE bypasses the TCP/IP stackRoCE reduces latency by bypassing much of the traditional TCP/IP networking stack, allowing faster communication between Compute Nodes and Storage Cells.
  6. Smart Scan depends on the network fabricTechnologies such as Smart Scan and Cell Offloading rely on efficient networking to deliver filtered data quickly and minimize unnecessary transfers.
  7. Modern Exadata benefits from RoCE scalabilityModern Oracle Exadata systems benefit from RoCE's scalability, compatibility with cloud environments, and support for next-generation workloads.
  8. The shift was evolution, not replacementOracle's move from InfiniBand to RoCE wasn't about replacing a successful technology — it was about evolving Exadata networking for the future while preserving the ultra-low latency that Exadata is known for.

Networking Is Part of the Database Engine

In Oracle Exadata, networking isn't simply the path between servers — it's part of the database engine itself. The faster Compute Nodes and Storage Cells communicate, the less time your database spends waiting and the more time it spends delivering results.

Oracle's move from InfiniBand to RoCE wasn't about replacing a successful technology — it was about evolving Exadata networking for the future while preserving the ultra-low latency that Exadata is known for.

At ExaGuru, our Exadata Expert course covers ExaCC, ExaCS, RoCE fabric monitoring, Smart Scan tuning, and production migration patterns — because understanding the network layer is the first step to exploiting Exadata's full performance envelope.

ExaGuru — Oracle Cloud Training & Consulting
Exadata · ExaCC/ExaCS · OCI · Oracle DB Migration · Fusion ERP/HCM · Oracle Database 23ai & AI
Contact Us: +91-6394049607 · +91-9161111705
© 2026 ExaGuru. All rights reserved.