Containerized AI Execution on

Deployment of AI-ready Kubernetes

Mon, 01 Jan 0001 00:00:00 +0000

Tools like Kubernetes provide robust orchestration for deploying AI workloads at scale, being able to manage isolation between cluster workloads and GPU resources for AI inference tasks. With the use of NVIDIA GPU Operator, you perform the provision of the necessary NVIDIA drivers and libraries for making GPU resources available to containers. Kubernetes embraces multitenancy, supporting different isolated namespaces where the access from different teams or users are managed with Role Based Access Control (RBAC) and network policies. As an administrator, you can also enforce limits on the GPU usage or other resources consumed per namespace, ensuring fair resource allocation.

Deployment of NVIDIA Dynamo

Mon, 01 Jan 0001 00:00:00 +0000

NVIDIA® Dynamo is a high-performant inference framework for serving AI models in an agnostic way, such as across any framework, architecture or deployment scale, as well as in multi-node distributed environments. Being an agnostic inference engine, it supports different backends like TRT-LLM, vLLM, SGLang, etc. Dynamo also allows you to declare inference graphs which deploy different containerized components in a disaggregated way- like an API frontend, a prefill worker, a decode worker, a K/V cache, and others - and to let them interact to efficiently respond to the user queries.

Deployment of the NVIDIA KAI Scheduler

Mon, 01 Jan 0001 00:00:00 +0000

The NVIDIA® KAI Scheduler is an open source Kubernetes-native scheduler designed to optimize GPU resource allocation for AI and machine learning workloads at scale. It is capable of managing large GPU clusters and handling high-throughput demanding workload environments. KAI Scheduler targets both interactive jobs and large-scale training or inference tasks within the same cluster, orchestrating available resources across different users and teams. It also operates alongside other schedulers installed in a cluster.