<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Direct AI Execution on</title><link>https://docs.opennebula.io/7.2/solutions/ai_factory_blueprints/direct_ai_execution/</link><description>Recent content in Direct AI Execution on</description><generator>Hugo</generator><language>en</language><lastBuildDate>Tue, 28 Oct 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://docs.opennebula.io/7.2/solutions/ai_factory_blueprints/direct_ai_execution/index.xml" rel="self" type="application/rss+xml"/><item><title>Inferencing with vLLM</title><link>https://docs.opennebula.io/7.2/solutions/ai_factory_blueprints/direct_ai_execution/llm_inference_certification/</link><pubDate>Tue, 28 Oct 2025 00:00:00 +0000</pubDate><guid>https://docs.opennebula.io/7.2/solutions/ai_factory_blueprints/direct_ai_execution/llm_inference_certification/</guid><description>&lt;p&gt;The &lt;a href="https://docs.vllm.ai/en/latest/"&gt;vLLM&lt;/a&gt; Inference Framework is a production-grade, high-performance inference engine designed for large-scale LLM serving.&lt;/p&gt;
&lt;p&gt;The main characteristics of vLLM Inference Framework are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Supports single-node deployments with one or more GPUs.&lt;/li&gt;
&lt;li&gt;Uses Python’s native multiprocessing for multi-GPU inference.&lt;/li&gt;
&lt;li&gt;Does not require additional frameworks, such as Ray, unless deploying across multiple nodes, which is out of scope for this benchmarking task.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this guide you will find the necessary steps and best practices to deploy the OpenNebula vLLM appliance and perform an inference benchmarking to check its performance.&lt;/p&gt;</description></item><item><title>Fine-Tuning AI Models on NVIDIA Slurm</title><link>https://docs.opennebula.io/7.2/solutions/ai_factory_blueprints/direct_ai_execution/nvidia_slurm/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://docs.opennebula.io/7.2/solutions/ai_factory_blueprints/direct_ai_execution/nvidia_slurm/</guid><description>&lt;p&gt;&lt;a id="finetuning_on_slurm_worker"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In this tutorial, we will install and configure the OpenNebula &lt;strong&gt;Slurm&lt;/strong&gt; appliance and run a fine-tuning example script.&lt;/p&gt;
&lt;p&gt;We will complete the following high-level steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Install the Slurm appliances (controller and workers) from the OpenNebula marketplace.&lt;/li&gt;
&lt;li&gt;Configure the Slurm worker template with an example fine-tuning job script.&lt;/li&gt;
&lt;li&gt;Submit a fine-tuning job from the &lt;strong&gt;Slurm controller&lt;/strong&gt; with a single command.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="before-starting"&gt;Before Starting&lt;/h2&gt;
&lt;p&gt;Before starting this tutorial, you must complete the AI-factory deployment with either on-premises resources or cloud resources. Please complete one of the following guides relevant to your available resources:&lt;/p&gt;</description></item></channel></rss>