Overview

Overview of AI factory deployment and validation

The purpose of the AI Factory collection is to provide a step-by-step process for setting up a simple AI Factory system and getting it up and running quickly, including:

Identifying the minimum hardware and networking requirements for your AI Factory. These baseline specifications also serve as a reference for more advanced deployments. OpenNebula supports high-performance architectures such as InfiniBand, Spectrum-X, and NVLink, although these setups are not automated and require custom configuration.
Follow the step-by-step deployment instructions using OneDeploy to build your AI Factory, with options for both on-premises installations and cloud-based deployments.
Optionally, you can validate the setup using the same methodology we apply during formal infrastructure acceptance. This validation covers direct vLLM execution for inference, SLURM integration for fine-tuning, and Kubernetes-based execution using NVIDIA Dynamo® for inference and NVIDIA KAI Scheduler®.

Basic Outline

Configuring, deploying and validating a high-performance AI infrastructure using OpenNebula involves these steps:

Familiarize yourself with Architecture and Specifications. We recommend consulting the guide on GPU PCI-passthrough for details relating to your GPU hardware and IOMMU.
Deploy and configure your AI Factory with one of these alternatives:
- On-premises AI Factory Deployment: Set up an AI Factory using OneDeploy for On-premise environments.
- On-cloud AI Factory Deployment: Set up an AI Factory using OneDeploy on Scaleway for cloud environments.
Perform Validation: As a prerequisite, you must have an AI Factory ready to be validated after completing the above installation procedures. These are the options to validate your AI Factory:
- Direct AI execution:
  - LLM Inferencing with vLLM: Using vLLM with two different models and two model sizes, running across both H100 and L40S GPUs.
  - LLM Fine-Tuning with NVIDIA Slurm: Fine tuning an AI model using the OpenNebula NVIDIA Slurm appliance.
- Containerized AI Execution:
  - Deployment of AI-Ready Kubernetes: Use H100 and L40S deployment to run Kubernetes.
  - LLM Inferencing with NVIDIA Dynamo: Integrating the GPU-powered Kubernetes cluster with the NVIDIA Dynamo Cloud Platform to provision and manage AI workloads through the Dynamo framework for your AI workloads on top of the NVIDIA Dynamo framework.
  - Scheduling with NVIDIA KAI Scheduler: Use the NVIDIA KAI Scheduler to share GPU resources across different workloads within the AI-ready Kubernetes cluster.

GIVE FEEDBACK

Was this resource helpful?

Glad to hear it

Sorry to hear that