Article

NVIDIA DSX

Introduction

For years, building an AI data center meant assembling a puzzle from dozens of vendors, each speaking a different language. Chips from one company, software from another, cooling from a third, and a facility design that nobody had fully validated end to end. The result was expensive guesswork at massive scale.

NVIDIA’s answer to this is the DSX platform, announced at GTC Taipei on May 31, 2026. It is a full-stack, codesigned framework that gives infrastructure builders a single validated playbook to design, deploy, and operate AI factories at scale. The goal is simple: turn every megawatt into more intelligence, at the lowest possible token cost.

What Is an AI Factory?

Before diving into DSX, it helps to understand what NVIDIA means by an “AI factory.” The term refers to large-scale infrastructure purpose-built to produce AI outputs, primarily tokens from large language models and other AI workloads. Unlike traditional data centers that run diverse general-purpose workloads, AI factories are optimized around one metric: how many tokens can you produce per megawatt of power consumed, and at what cost.

This framing matters because it changes how every layer of the stack is designed. Networking must support extreme east-west GPU communication. Storage must feed models without becoming a bottleneck. Cooling must handle dense GPU clusters running at sustained high utilization. Power must be managed dynamically to stay within budget while maximizing throughput. DSX is designed with all of these constraints baked in from the beginning.

The DSX Platform: What It Covers

DSX spans the full stack, from silicon and systems to infrastructure software, facility design, and partner integrations. Here is a breakdown of each major component.

DSX MaxLPS

MaxLPS stands for Maximum Liquid Performance per Slot, and it is the efficiency engine of the platform. The core idea is to run as many GPUs as possible within a fixed power budget while keeping them at their most energy-efficient operating point.

Key capabilities include:

  • 45-degree Celsius liquid cooling, which allows higher GPU density without thermal throttling
  • In-rack power optimization technologies that tune performance-per-watt at the hardware level
  • Ability to run up to 40% more GPUs within the same power envelope compared to standard configurations
  • Minimal impact on workload performance despite the efficiency constraints

For operators paying for power by the megawatt, this directly translates to lower token costs and higher revenue per unit of infrastructure spend.

DSX OS

DSX OS is an open source, modular software stack purpose-built for AI factory operations. It is not a general-purpose Kubernetes distribution or a cloud management platform bolted onto GPU clusters. It is designed specifically for the operational complexity of multi-tenant, production AI infrastructure.

Core capabilities include:

  • Lifecycle management: Automated provisioning, upgrades, and decommissioning of AI factory nodes
  • Intelligence scheduling: Workload placement optimized for GPU topology, power constraints, and tenant requirements
  • Runtime consistency: Ensuring reproducible environments across heterogeneous hardware generations
  • Health automation: Continuous monitoring and automated remediation of hardware and software failures
  • Resiliency: Fault tolerance at the cluster level, not just the node level
  • Multi-tenant operations: Secure isolation between tenants sharing the same physical infrastructure
  • Platform services: Common APIs and integrations that ecosystem partners can build on

Partners already integrating DSX OS components include Red Hat, Mirantis, Rafay, Spectro Cloud, OpenNebula Systems, vCluster, Supermicro, Vultr, and several others.

DSX Reference Design

The Reference Design is the validated architectural blueprint for an AI factory. It covers every physical and logical layer:

  • Compute configuration and GPU cluster topology
  • High-speed networking (InfiniBand or Ethernet) for GPU-to-GPU communication
  • Storage architecture for model weights, checkpoints, and datasets
  • Hardware cluster design including rack layout and cable management
  • Facilities infrastructure: power distribution, cooling plant design, UPS systems
  • Controls and monitoring for both IT and operational technology systems
  • Civil, structural, and architectural design guidelines for new facility construction

The value here is validation. NVIDIA has run these configurations at scale and knows they work. Infrastructure builders do not have to discover failure modes in production.

DSX Sim

DSX Sim is a high-fidelity simulation layer for the entire AI factory lifecycle. The pitch from Jensen Huang was direct: simulate the entire factory before spending a dollar, validate performance before a single rack is installed.

DSX Sim covers:

  • Planning and design phase: Model different architectural choices and their cost and performance implications before committing to hardware
  • Deployment phase: Simulate installation sequences, cooling behavior, and power draw to catch issues before they become expensive problems
  • Operations phase: Run what-if scenarios for capacity expansion, workload changes, or infrastructure upgrades

System manufacturers including Quanta Cloud Technology and Pegatron are working with Dassault Systemes to build a live AI factory digital twin configurator that automates the process from rack design to facility deployment. This expands the NVIDIA Omniverse DSX Blueprint ecosystem, deepening integration with simulation software partners Cadence, PTC, and Siemens.

DSX Flex

DSX Flex addresses one of the most pressing constraints facing large-scale AI infrastructure: power grid access. As AI factories consume hundreds of megawatts, their relationship with the utility grid becomes a strategic concern, both for cost and for grid reliability.

DSX Flex connects AI factories to power grid services and enables:

  • Dynamic workload adaptation in response to grid signals such as load shedding, demand response events, and real-time pricing
  • Renewable and hybrid power orchestration across utility power, onsite renewable generation, and battery storage
  • Revenue opportunities from grid services programs that pay large consumers to reduce load during peak demand

A commercial multi-megawatt pilot is already underway with Emerald AI and Silicon Valley Power, demonstrating grid-responsive AI factory operations that protect workload performance while participating in utility grid programs. This kind of demand-response capability is increasingly important as utilities try to manage the load growth from AI infrastructure.

DSX Exchange

DSX Exchange is the integration layer that connects signals across the full operational stack of an AI factory. It enables scalable, secure data exchange between:

  • IT systems (compute, networking, storage management)
  • Operational technology systems (power distribution, cooling plant, UPS)
  • Operations agents and automation platforms

The practical value is visibility and coordination. A temperature spike in one part of the cooling plant can trigger workload migration before GPUs are throttled. A power anomaly can be correlated with workload behavior in real time. DSX Exchange provides the plumbing for this kind of cross-layer intelligence.


The Ecosystem

NVIDIA is not building DSX alone. The platform’s value is directly tied to how many vendors align their products to it. At launch, the ecosystem covers every critical layer.

Cloud and Infrastructure Providers

The following cloud providers are deploying core DSX platform components including DSX Sim, DSX MaxLPS, and DSX OS:

  • CoreWeave
  • Crusoe
  • Firmus
  • IREN
  • Lambda
  • Nebius
  • Nscale
  • Yotta Data Services

These are not academic deployments. These providers are using DSX to reduce risk, improve GPU utilization, and bring AI cloud capacity online faster.

System Manufacturers

The following companies are building NVIDIA DSX-ready systems and contributing simulation-ready assets:

  • Dell Technologies
  • HPE
  • Lenovo
  • Supermicro
  • ASUS
  • Foxconn
  • GIGABYTE
  • Pegatron
  • Quanta Cloud Technology (QCT)
  • Wistron
  • Wiwynn

DSX-ready systems means these manufacturers have validated their hardware against the DSX Reference Design and can deliver complete, full-stack AI factory solutions.

Software Partners

Partners integrating DSX OS components for lifecycle management, security, health automation, multi-tenancy, and platform services include:

  • Red Hat
  • Mirantis
  • Rafay
  • Spectro Cloud
  • OpenNebula Systems
  • vCluster
  • Supermicro
  • Vultr
  • Aible, BeyondAI, Bhashini, DCAI, Sarvam, Simplismart

Simulation and Engineering Partners

  • Dassault Systemes (AI factory digital twin configurator with QCT and Pegatron)
  • Cadence
  • PTC
  • Siemens

Why This Matters

For Infrastructure Builders

The immediate practical benefit is de-risking large capital commitments. AI factory buildouts involve hundreds of millions of dollars in hardware, construction, and power contracts. DSX simulation and reference designs give operators a validated starting point rather than a blank sheet of paper. The cost of a design mistake at this scale is enormous; DSX reduces that risk substantially.

For Token Economics

The MaxLPS efficiency gains are not incremental. Running 40% more GPUs per megawatt, at the same or better performance level, directly changes the unit economics of AI inference and training. For cloud providers selling compute by the token or by the hour, this is a meaningful competitive advantage.

For Grid and Power Strategy

DSX Flex represents a more sophisticated approach to energy than most data center operators have taken. Treating the AI factory as a grid-responsive asset rather than a passive consumer opens new revenue streams from demand response programs and reduces the risk of power constraints limiting expansion. As utilities struggle to keep up with AI load growth, operators that can flex their consumption will have better relationships with grid operators and potentially better access to additional power capacity.

For NVIDIA

DSX is a strategic move to deepen NVIDIA’s position beyond chips. By defining the reference architecture for AI factories and building a broad ecosystem of partners aligned to that architecture, NVIDIA makes itself harder to displace even as alternative accelerators emerge. Every infrastructure builder working from NVIDIA’s playbook is a stronger anchor customer than one who just bought GPUs.


The Bigger Picture

Jensen Huang’s statement at GTC Taipei was deliberately broad: “We’re not just shipping chips, we’re giving every infrastructure builder a complete playbook to build AI factories.”

That framing is intentional. NVIDIA is positioning DSX not as a product but as an industry standard. The reference designs, simulation tools, and open source software are designed to become the default way AI factories are built, in the same way that TCP/IP became the default way networks were built. Whether DSX achieves that level of adoption depends on whether the ecosystem commits to it and whether the tools deliver on their promises at production scale.

The commercial deployments already underway with major cloud providers suggest the foundation is real. The coming 12 to 18 months will show whether DSX becomes the industry’s shared language for AI factory infrastructure or remains one vendor’s framework among many.


Source: https://nvidianews.nvidia.com/news/dsx-infrastructure-ai-factory