User Guide
Enabling AI Ops
Overview
You can enable AI Ops with CloudNatix. The provided features includes:
- LLM hosting with the API compatible with OpenAI API
- Development environment with Jupyter Notebooks
- GPU federation across multiple GPU clusters
For example, you can host LLM models in your Kubernetes clusters and run chat completion or fine-tunings with the models. You can also submit training jobs to a global K8s cluster hosted by CloudNatix, which schedules jobs over your GPU clusters.
Enabling the AI Ops Feature
AI Ops features are enabled by installing LLMariner.
Prerequisites
- The
cnatix
CLI versionv0.841.0
or later - The
llma
CLI versionv1.25.0
or later (installation procedure) - S3-compatible object store
LLMariner stores models (including fine-tuned models) in a S3-compatible object store. If you're using an EKS cluster, we need to an S3 bucket and corresponding an IAM role for allowing the LLMariner service account to access the S3 bucket.
Please see LLMariner page for more information.
Step 1. Obtain a cluster registration key
Run the following command. It will output a cluster registration key.
# Set the API base URL to https://api.llm.cloudnatix.com/v1
llma auth login
llma admin clusters register <cluster-name>
We later use the cluster registration key when installing LLMariner.
If you prefer GUI, you can also visit https://app.llm.cloudnatix.com and register a new cluster from there.
Step 2. Create a secret for HuggingFace (optional)
If you would like to download models from HuggingFace, create a K8s secret in the cloudnatix
namespace.
kubectl create namespace cloudnatix
kubectl create secret generic \
huggingface-key \
-n cloudnatix \
--from-literal=apiKey=${HUGGING_FACE_HUB_TOKEN}
In the next step, we will configure LLMariner to use the secret.
Step 3. Create a values.yaml
file for LLMariner
Create a values.yaml
file used by the LLMariner Helm chart.
You can access ArtifactHub to understand the schema of the values.yaml
(by clicking "DEFAULT VALUES"
or "VALUES SCHEMA"
).
Here is an example:
global:
objectStore:
s3:
bucket: cloudnatix-aiops
endpointUrl: ""
region: us-west-2
# This is required only when an S3 bucket is accessed with the secret key.
awsSecret:
name: aws
accessKeyIdKey: accessKeyId
secretAccessKeyKey: secretAccessKey
inference-manager-engine:
replicaCount: 2
runtime:
runtimeImages:
ollama: mirror.gcr.io/ollama/ollama:0.3.6
vllm: public.ecr.aws/cloudnatix/llm-operator/vllm-openai:20250115
triton: nvcr.io/nvidia/tritonserver:24.09-trtllm-python-py3
model:
default:
runtimeName: vllm
overrides:
NikolayKozloff/DeepSeek-R1-Distill-Qwen-14B-Q4_K_M-GGUF:
preloaded: true
resources:
limits:
nvidia.com/gpu: 1
vllmExtraFlags:
- --tokenizer
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
lmstudio-community/phi-4-GGUF/phi-4-Q4_K_M.gguf:
preloaded: true
resources:
limits:
nvidia.com/gpu: 1
vllmExtraFlags:
- --tokenizer
- microsoft/phi-4
model-manager-loader:
baseModels:
- lmstudio-community/phi-4-GGUF/phi-4-Q4_K_M.gguf
- NikolayKozloff/DeepSeek-R1-Distill-Qwen-14B-Q4_K_M-GGUF
downloader:
kind: huggingFace
huggingFace:
cacheDir: /tmp/.cache/huggingface/hub
huggingFaceSecret:
name: huggingface-key
apiKeyKey: apiKey
Step 4. Install
Type cnatix clusters configure
. You will be asked whether you would like to install LLMariner, your cluster registration key, and the location of its values.yaml
file.
Once the command completes, you can follow the regular CloudNatix installation procedure.
Step 5. Test
Once the installation complete, check the health status of the registered cluster.
llma admin clusters list
You can also see a list of hosted models by typing:
llma models list
Once a model is loaded, you can ask a question to the model:
llma chat completions create \
--model <model-name> \
--role user \
--completion "What is k8s?"
Note on Fine-tuning Jobs
LLMariner provides the file upload API, but the API is not supported when the LLMariner controler plane is hosted by CloudNatix. This is because CloudNatix Global Cluster Controller doesn't want to store a customer's training data in its storage.
You can still use llma storage files create-link
to create file objects. This command creates file objects without actually uploading files.
Please visit the LLMariner page for more information.