The necessary modularity of data centers to meet the challenge of AI

The sudden democratization of artificial intelligence on a large scale over the past year, in particular new generative AI applications such as ChatGPT, has imposed a number of new technical requirements on the data centers where these applications are hosted. The infrastructure that supports them will thus consume more energy, process more data and use more bandwidth than before, all in facilities that could have been built more than 20 years ago. These must now adapt to support these new orders of magnitude, especially in terms of density per rack. The only way to achieve this is to adopt a modular design.

Data centers can seem to be very static entities; huge buildings with rows of generators and a multitude of equipment, all carefully designed so that the installation works without interruption, whether under normal operating conditions, or in the event of a total failure of the electrical network. However, modern data centers are anything but static; many installations are designed from the outset to be highly modular: a floor can thus evolve to adapt to a change in network topology, air flow or physical redundancy several times a year if necessary. What is the reason for this, and how to achieve it?

The widespread emergence of AI-related deployments in data centers shows how quickly customer needs can evolve. While in 2022, a data center operator could predict an average power consumption of 10 kilowatts per rack for customer equipment, the need for racks of 25, 50 or even 100 kilowatts is becoming more and more important, and is constantly growing. With a traditional static design, this can create many problems in terms of performance, maintenance and redundancy.

First of all, these dense racks often require more network bandwidth to operate at their highest efficiency level. A point that is too often overlooked and that can generate dissatisfaction on the part of a customer who is unable to deploy a high density per rack without obtaining the adequate bandwidth.

Secondly, an uneven increase in energy consumption over the entire floor of a data center can often put a strain on a cooling system that has not been designed to handle these types of hot spots. A dense rack at one end of a row can easily lead to an increase in temperatures at the other end.

Finally, the resilience and redundancy measures are calculated according to the location of specific electrical loads in the installation, and their distribution. If a very dense group of equipment is added in an area, the electrical capacities of the generators may not be guaranteed due to static designs.

Each of these concerns can have important consequences, ranging from the inability to operate its AI equipment to the maximum of its performance potential, to the possibility of an unwanted downtime in the event of a power outage or voltages from the local electricity grid. By using a modular, highly adaptable design framework, these problems can be solved within any data center, regardless of its age.

For example, spaces can be reallocated or designed at the beginning of the installation to be used as additional network rooms to allow the installation of more circuits, switches and routers to increase the bandwidth of the network to the customer over time. In parallel, a modular method of designing and deploying aerial cable trays allows the data center operator to physically bring this connectivity to the customer, which is often overlooked in static designs. Some technologies that make it possible to exploit AI, such as InfiniBand, can use heavy and bulky wiring that can only be installed in a modular way in order to avoid real performance and operational problems at the end of the line.

Understanding the real state of cooling of an installation, thanks to the use of CFD (Computational Fluid Dynamics), allows the data center operator to identify trapped air flows, involuntary air flow patterns that can lead to unoptimized cooling, and places where there is additional air capacity that can be used to cool dense, and particularly hot, AI deployments.

Many data centers can also be modular enough to switch from an air-only cooling configuration to a hybrid configuration where air and liquid cooling (AALC and DLC) is available, depending on the needs, allowing AI deployments to take place as part of an existing data center or a larger dedicated space.

With a modular power configuration – where the data center is conceptualized as a series of blocks, each with its own power, redundancy and cooling infrastructure – the basic components can be sized and deployed appropriately according to the customer’s deployment, to ensure that as deployments are added to a space, even if they differ significantly in terms of energy consumption, they can be supported.

These are just a few examples of how a modular approach to data center design helps to ensure that AI deployments, even at very high rack densities, while being supported in a high-performance, robust and cost-effective way in an existing data center.

Modular designs will make the difference between the ability to support current and future generations of AI deployments in existing sites and the need to build new ones.