Item: Align
Rating: 5
Author: Ed Fasano

Welcome to Align’s InfiniBand Series!

Part I: Demystifying the InfiniBand vs. Ethernet Debate

Part II: A Closer Look at Cables, Termination Types & Fiber Innovations

Part III: Building a Network for High-Performance Computing and AI

How InfiniBand and AI are Dramatically Reshaping Data Center Infrastructure

Before diving into this 4^th installment of our InfiniBand series, let's do a quick recap of what we’ve discussed so far:

InfiniBand is a networking protocol for high speed, low latency transmission.
Previously developed by a company owned by NVIDIA. It's a natural fit to be able to support Nvidia's GPUs.
Another popular transmission protocol that is an alternative for InfiniBand is RoCE (pronounced like Rocky). RDMA over Converged Ethernet
The network transmission speeds have hit ludicrous speeds. You're no longer talking about 10 gig or 100 gig networks. We are now in the realm of 400 and 800 gigabit throughput and quickly pushing beyond.
Testing and cleaning are critical, but the processes and tools are being developed and evolving in real time.

Rodney Willis_Circle Now that we have completed the first 3 installments of the Demystifying InfiniBand series, it’s important to understand that we can't think of a standard like InfiniBand in a vacuum. There are support systems, support networks, and specific infrastructures needed to enable the Gen AI / Machine Learning workloads

So, in this blog, we’re going to be talking about the multiple systems required to support InfiniBand, and speaking more broadly, GPU infrastructure. A critical component of this infrastructure is the high-speed networking that connects GPUs and enables them to work together as a single brain. While InfiniBand is a key technology in this space, it's part of a larger ecosystem of networking solutions that are rapidly evolving to meet the demand.

The Networking Backbone: Adapting to New Demands

This evolution is driving significant changes in data center networking infrastructure. The densification of these fiber networks, the speeds of transmission, combined with the vast number of AI deployments is a significant inflection point for the data center infrastructure industry. It has tipped the scales from duplex to parallel transmission as the dominant type of installation we are seeing. One of the few blockers is the scarcity of MPO-16 /Base-16 MTP availability because of exceeding current capacity. This is driving more of the mix back to MPO -8 / Base 8 which a portion would be the 16 fiber connectors if the MPO-16 were available. The expanding capacity requirements justify the push to the newer smaller form factor multi fiber connector like the MMC and SN-MT connectors. Those are waiting for IEEE and the optics manufactures to standardize on a footprint to help drive the adoption.

The next roadblock is the testers. Field testers currently do not have the new interfaces as an option without using hybrid patch cords. MPO-16 still needs to use a Y cable and multiple tests. This becomes cumbersome when parsing thousands of tests while aggregating and reviewing test data.

How Fiber Advancements Cascade Through Data Center Design

While these advancements in fiber optic networking are crucial for supporting modern AI and GPU infrastructures, they also have far-reaching implications for overall data center design and operation. Let's explore how these changes ripple through various aspects of data center infrastructure.

More fiber and additional routes in mesh topologies demand increased conveyance considerations and consume more cabinet space. This necessitates meticulous cable management to facilitate easy tracing and troubleshooting while ensuring cable bundles don't impede equipment airflow. It's crucial to maintain proper bend radii, protect pull points, and minimize strain on MPO connectors to prevent micro or macro bend fractures in the fiber.

We've also witnessed a paradigm shift in redundancy design. The traditional A-side/B-side mentality has given way to a more integrated approach, reflecting the interconnected nature of GPU clusters where losing a single node could be catastrophic to AI training patterns. This shift extends to power distribution, with 3 power sources replacing the typical A/B configuration. The reason for this lies in the power redundancy from server manufacturers where the server itself requires a block redundant configuration. For example, a majority of H100 servers require 6 of 8 power supplies to be up for the server to remain operational. In a typical A/B configuration where the 8 power supplies were split across 2 power systems a failure of either system would result in the shut down of a server/cabinet.

The density of compute power has skyrocketed. What once required 10,000 square feet to house 1MW of capacity can now be condensed into a single row. This concentration creates new bottlenecks, particularly in electrical distribution. 415V/400A distribution to the cabinet level has become the norm in recent years but it is proving to be a limiting factor when we talk about powering a full row of AI clusters at 100+Kw.

It is clear that liquid to the rack and to the chip will be requirements as AI workloads evolve. We are not yet seeing universal adoption of a standard approach with regard to direct to chip, rear door heat exchangers, single or two-phase liquid to chip, etc. For liquid cooling solutions, whether direct-to-chip or rear door heat exchangers, new considerations arise. These include – space for manifolds within racks, hose connection types, CDU configurations, leak detection concerns to name a few. The lack of standardization adds another layer of complexity to the design effort and increases the need for flexible/modular designs.

And finally, we're seeing the need for a fundamental change in design philosophy. For decades, data centers were designed from the outside in. Now, the specific requirements of modern servers demand an inside-out approach. We're moving beyond merely "Server Ready" environments to "Specific Server Ready" setups. A recent project illustrates this point: two different server manufacturers in an AI cluster, both using liquid cooling and rear door heat exchangers, required different numbers of hose connections. Without this specific understanding, deployments risk delays or cost overruns in attempts to "future-proof" the infrastructure.

The increasing density of systems and components competing for limited space demands careful planning and design. Balancing the needs of containment walls, conveyance systems, and maintaining appropriate distances for transmission protocols requires a holistic approach. In this new landscape, every detail matters, and the interplay between various subsystems can make or break a data center's efficiency and effectiveness.