Overview
The purpose of the AI Factory collection is to provide a step-by-step process for setting up a simple AI Factory system and getting it up and running quickly, including:
Identifying the minimum hardware and networking requirements for your AI Factory. These baseline specifications also serve as a reference for more advanced deployments. OpenNebula supports high-performance architectures such as InfiniBand, Spectrum-X, and NVLink, although these setups are not automated and require custom configuration.
Follow the step-by-step deployment instructions using OneDeploy to build your AI Factory, with options for both on-premises installations and cloud-based deployments.
Optionally, you can validate the setup using the same methodology we apply during formal infrastructure acceptance. This validation covers direct vLLM execution for inference, SLURM integration for fine-tuning, and Kubernetes-based execution using NVIDIA Dynamo® for inference and NVIDIA KAI Scheduler®.
Basic Outline
Configuring, deploying and validating a high-performance AI infrastructure using OpenNebula involves these steps:
Familiarize yourself with Architecture and Specifications. We recommend consulting the guide on GPU PCI-passthrough for details relating to your GPU hardware and IOMMU.
Deploy and configure your AI Factory with one of these alternatives:
- On-premises AI Factory Deployment: Set up an AI Factory using OneDeploy for On-premise environments.
- On-cloud AI Factory Deployment: Set up an AI Factory using OneDeploy on Scaleway for cloud environments.
Perform Validation: As a prerequisite, you must have an AI Factory ready to be validated after completing the above installation procedures. These are the options to validate your AI Factory:
- LLM Inferencing with vLLM: Using vLLM with two different models and two model sizes, running across both H100 and L40S GPUs.
- LLM Fine-Tuning with NVIDIA Slurm: Fine tuning an AI model using the OpenNebula NVIDIA Slurm appliance.
- Deployment of AI-Ready Kubernetes: Use H100 and L40S deployment to run Kubernetes.
- LLM Inferencing with NVIDIA Dynamo: Integrating the GPU-powered Kubernetes cluster with the NVIDIA Dynamo Cloud Platform to provision and manage AI workloads through the Dynamo framework for your AI workloads on top of the NVIDIA Dynamo framework.
- Scheduling with NVIDIA KAI Scheduler: Use the NVIDIA KAI Scheduler to share GPU resources across different workloads within the AI-ready Kubernetes cluster.
GIVE FEEDBACK
Was this resource helpful?
Glad to hear it
Sorry to hear that