Overview

Overview of AI factory deployment and validation

The purpose of the AI Factory collection is to provide a step-by-step process for setting up a simple AI Factory system and getting it up and running quickly, including:

  • Identifing the minimum hardware and networking requirements for your AI Factory. These baseline specifications also serve as a reference for more advanced deployments. OpenNebula supports high-performance architectures such as InfiniBand, Spectrum-X, and NVLink, although these setups are not automated and require custom configuration.

  • Follow the step-by-step deployment instructions using OneDeploy to build your AI Factory, with options for both on-premises installations and cloud-based deployments.

  • Optionally validate your setup using the same methodology we apply during formal infrastructure acceptance. This validation focuses on using AI-ready Kubernetes with NVIDIA Dynamo® or NVIDIA KAI Scheduler®.

Basic Outline

Configuring, deploying and validating a high-performance AI infrastructure using OpenNebula involves these steps:

  1. Familiarize yourself with Architecture and Specifications. We recommend to consult the guide on GPU PCI-passthrough for details relating to your GPU hardware and IOMMU.

  2. Deploy and configure your AI Factory with one of these alternatives:

  3. Perform Validation: as a prerequisite, you must have an AI Factory ready to be validated. These are the options to validate your AI Factory: