Red Hat OpenShift AI (RHOAI) builds on the capabilities of Red Hat OpenShift to provide a single, consistent, enterprise-ready hybrid AI and MLOps platform. It provides tools across the full lifecycle of AI/ML experiments and models including training, serving, monitoring, and managing AI/ML models and AI-enabled applications. This is my personal repository to test and play with some of its most important features.
RHOAI is a product under continuous improvement, so this repo will be outdated at some point in time. I recommend you to refer to the Official documentation to check the latest features or you can try the official trainings.
Red Hat OpenShift AI (RHOAI) is a platform for data scientists, AI practitioners, developers, machine learning engineers, and operations teams to prototype, build, deploy, and monitor AI models. This is a wide variety of audience that needs different kinds of training. For that reason, there are several courses that will help you to understand RHOAI from all angles:
-
AI262 - Introduction to Red Hat OpenShift AI: About configuring Data Science Projects and Jupyter Notebooks.
-
AI263 - Red Hat OpenShift AI Administration: About installing RHOAI, configuring users and permissions and creating Custom Notebook Images.
-
AI264 - Creating Machine Learning Models with Red Hat OpenShift AI: About training models and enhancing the model training.
-
AI265 - Deploying Machine Learning Models with Red Hat OpenShift AI: About serving models on RHOAI.
-
AI266 - Automating AI/ML Workflows with Red Hat OpenShift AI: About creating Data Science Pipelines, and Elyra and Kubeflow Pipelines.
-
AI267 - Developing and Deploying AI/ML Applications on Red Hat OpenShift AI: All the previous courses altogether.
The following diagram depicts the general architecture of a RHOAI deployment, including the most important components:
-
codeflare: Codeflare is an IBM software stack for developing and scaling machine-learning and Python workloads. It uses and needs the Ray component.
-
dashboard: Provides the RHOAI dashboard.
-
datasciencepipelines: This enables you to build portable machine learning workflows. It is based on Argo Workflows and you don’t need to install OCP Pipelines operator.
-
kserve: RHOAI uses Kserve to serve large language models that can scale based on demand. Requires the OpenShift Serverless and the OpenShift Service Mesh operators to be present before enabling the component. Does not support enabled ModelMeshServing at the same time.
-
kueue: Kueue component configuration. It is not yet in Technology Preview
-
modelmeshserving: KServe also offers a component for general-purpose model serving, called ModelMesh Mesh Serving. Activate this component to serve small and medium size models. Does not support enabled Kserve at the same time.
-
ray: Component to run the data science code in a distributed manner.
-
workbenches: Workbenches are containerized and isolated working environments for data scientists to examine data and work with data models. Data scientists can create workbenches from an existing notebook container image to access its resources and properties. Workbenches are associated to container storage to prevent data loss when the workbench container is restarted or deleted.
Installing RHOAI is not as simple as installing and configuring other operators on OpenShift. This product provides integration with hardware like NVIDIA and Intel GPUs, automation of ML workflows and AI training, and deployment of LLMs. For that reason, I’ve created an auto-install.sh script that will do everything for you:
-
If the installation is IPI AWS, it will create MachineSets for nodes with NVIDIA GPUs (Currently,
g5.4xlarge). -
Install all the operators that RHOAI depends on:
-
Service Mesh and Serverless to enable KServe and allow Single-Model serving platform.
-
Node Feature Discovery and Nvidia GPU Operator to discover and configure nodes with GPU.
-
Authorino, to enable token authorization for models deployed with RHOAI.
-
-
Install and configure OpenShift Data Foundation (ODF) in Multicloud Object Gateway (MCG) mode. This is a lightweight alternative that allows us to use the AWS S3 object storage the same way that we will then use Object storage on Baremetal using ODF.
-
Installs the actual RHOAI operator and configures the installation with some defaults, enabling NVIDIA acceleration and Single-Model Serving.
-
Deploys a new Data Science Project called
RHOAI Playgroundenabling pipelines and deploying a basicNotebookfor testing.
Some of the components deployed in this repo are bound to an specific version of OpenShift. If you want to deploy RHOAI on an older version (For example 4.17), you have to make the following modifications:
-
Change the image for the Node Feature Discovery container to the one for 4.17:
-
In
./rhoai-dependencies/operator-nfd/nodefeaturediscovery-nfd-instance.yaml, the.spec.operand.imagefield should have valueregistry.redhat.io/openshift4/ose-node-feature-discovery-rhel9:v4.17.
-
-
Change the channel of ODF:
-
In
./ocp-odf/odf-operator/sub-odf-operator.yaml, the value of.spec.channelfield should bestable-4.17.
-
|
💡
|
💡 Tip 💡 The script contains many tasks divided in clear blocks with comments. Use the Environment Variables or add comments to disable those that you are not interested in. |
In order to automate it all, it relays on OpenShift GitOps (ArgoCD), so you will to have it installed before executing the following script. Check out my automated installation on alvarolop/ocp-gitops-playground GitHub repository.
Now, log in to the cluster and just execute the script:
./auto-install.shMost of the activities related to RHOAI will require GPU Acceleration. For that purpose, we add NVIDIA GPU nodes during the installation process. In this chapter, I collect some information that might be useful for you.
In this automation, we are currently using the AWS g5.2xlarge instance, that according to the documentation:
Amazon EC2 G5 instances are designed to accelerate graphics-intensive applications and machine learning inference. They can also be used to train simple to moderately complex machine learning models.
The output of the following command will only be visible when you have applied the ArgoCD Application and the Node Feature Discovery operator has scanned the OpenShift nodes:
oc describe node | egrep 'Roles|pci'
Roles: control-plane,master
Roles: worker
feature.node.kubernetes.io/pci-1d0f.present=true
Roles: gpu-worker,worker
feature.node.kubernetes.io/pci-10de.present=true
feature.node.kubernetes.io/pci-1d0f.present=true
Roles: control-plane,master
Roles: control-plane,masterpci-10de is the PCI vendor ID that is assigned to NVIDIA.
The NVIDIA GPU Operator automates the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM based monitoring and others.
After configuring the Node Feature Discovery Operator and the NVidia GPU Operator using GitOps, you need to confirm that the Nvidia operator is correctly retrieving the GPU information. You can use the following command to confirm that OpenShift is correctly configured:
oc exec -it -n nvidia-gpu-operator $(oc get pod -o wide -l openshift.driver-toolkit=true -o jsonpath="{.items[0].metadata.name}" -n nvidia-gpu-operator) -- nvidia-smiThe output should look like this:
Sat Oct 26 08:47:06 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 25C P8 22W / 300W | 1MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+If, for some race condition, RHOAI is not detecting that GPU worker, you might need to force it to recalculate. You can do so easily with the following command:
oc delete cm migration-gpu-status -n redhat-ods-applications; sleep 3; oc delete pods -l app=rhods-dashboard -n redhat-ods-applicationsWait for a few seconds until the dashboard pods start again and you will see in the RHOAI web console that now the NVidia GPU Accelerator Profile is listed.
|
❗
|
If you want to achieve this properly, please, don’t miss reading this repo. |
Partitioning allows for flexibility in resource management, enabling multiple applications to share a single GPU or dividing a large GPU into smaller, dedicated units for different tasks. For the sake of simplicity and maximization of the reduced resources, I have enabled time-slicing configuration. You can check the configuration in rhoai-dependencies/operator-nvidia-gpu.
How to check that the configuration is applied?
oc get node --selector=nvidia.com/gpu.product="NVIDIA-A10G-SHARED" -o json | jq '.items[0].metadata.labels' | grep nvidiaAlso, you can check these two blog entries with an analysis from the RH Performance team about this topic:
The DataSciencePipelineApplication requires an S3-compatible storage solution to store artifacts that are generated in the pipeline. You can use any S3-compatible storage solution for data science pipelines, including AWS S3, OpenShift Data Foundation, or MinIO. The automation is currently using ODF with Nooba to interact with the AWS S3 interface, so you won’t need to do anything. Nevertheless, if you decide to disable ODF, you will need to create buckets on AWS S3 manually and for that you will need the following process:
-
Define the configuration variables for AWS is a file dubbed
aws-env-vars. You can use the same structure as inaws-env-vars.example -
Execute the following command to interact with the AWS API:
./prerequisites/s3-bucket/create-aws-s3-bucket.sh
-
Or execute the following command if you interact with MinIO:
./prerequisites/s3-bucket/create-minio-s3-bucket.sh
|
ℹ️
|
This is already included in the automation, so you don’t have to do anything with this section. |
By default, the Single Stack Serving in Openshift AI uses a self-signed certificate generated at installation for the endpoints that are created when deploying a server. This can be counter-intuitive because if you already have certificates configured on your OpenShift cluster, they will be used by default for other types of endpoints like Routes.
See the following blog entry to understand what is done in the automation.
This setup includes a preconfigured OpenShift project designed as a RHOAI Data Science Project. Explore and experiment with prebuilt pipelines to unlock powerful data analysis capabilities. Dive into RHOAI Pipelines and Experiments by clicking the button below:
You can use the distributed workloads feature to queue, scale, and manage the resources required to run data science workloads across multiple nodes in an OpenShift cluster simultaneously. These three components need to be enabled on the RHOAI installation configuration:
-
CodeFlare: Secures deployed Ray clusters and grants access to their URLs.
-
KubeRay: Manages remote Ray clusters on OpenShift for running distributed compute workloads.
-
Kueue: Manages quotas and how distributed workloads consume them, and manages the queueing of distributed workloads with respect to quotas.
If you want to try this feature, I recommend you to follow the RH documentation, which points to the following Guided Demos.
-
Documentation: Installation guide.
-
Documentation: Configuration guide.
-
Documentation: Usage guide.
After everything is configured, you can use the Model Tunning example from the Helm chart to see some stats:
helm template ./rhoai-environment-chart \
-s templates/modelTunning/cm-training-config.yaml \
-s templates/modelTunning/cm-twitter-complaints.yaml \
-s templates/modelTunning/pvc-trained-model.yaml \
-s templates/modelTunning/pytorchjob-demo.yaml \
--set modelTunning.enabled=true | oc apply -f -You can also see some stats from the RHOAI dasboard:
OpenShift AI now includes the possibility to deploy a model registry to store community and customized AI models. This model registry uses a mysql database as backend to store metadata and artifacts from your applications. Once deployed, your training pipelines can add an extra step putting model metadata to the registry.
Using RHOAI Model Registry you have a centralized source of models as well as a simple way to deploy prepared models:
Here you can find examples of REST requests to query model metadata:
MODEL_REGISTRY_NAME=default
MODEL_REGISTRY_HOST=$(oc get route default-https -n rhoai-model-registries -o go-template='https://{{.spec.host}}')
TOKEN=$(oc whoami -t)
# List models
curl -s "$MODEL_REGISTRY_HOST/api/model_registry/v1alpha3/registered_models?pageSize=100&orderBy=ID&sortOrder=DESC" \
-H "accept: application/json" \
-H "Authorization: Bearer ${TOKEN}" | jq .
# List all model versions
MODEL_NAME="test"
MODEL_ID="4"
curl -s "$MODEL_REGISTRY_HOST/api/model_registry/v1alpha3/registered_model?name=${MODEL_NAME}&externalId=${MODEL_ID}" \
-H "accept: application/json" \
-H "Authorization: Bearer ${TOKEN}" | jq .
curl -s "$MODEL_REGISTRY_HOST/api/model_registry/v1alpha/registered_models/${MODEL_ID}/versions?name=${MODEL_NAME}&pageSize=100&orderBy=ID&sortOrder=DESC" \
-H "accept: application/json" \
-H "Authorization: Bearer ${TOKEN}" | jq .If you want to try this feature, I recommend you to follow the RH documentation:
-
Documentation step 1: Configuring the model registry component.
-
Documentation step 2: Managing Model Registries.
-
Documentation step 3: Workign with Model Registries.
To ensure that machine-learning models are transparent, fair, and reliable, data scientists can use TrustyAI in OpenShift AI to monitor their data science models. Data scientists can monitor their data science models in OpenShift AI for Bias and Data Drift.
TRUSTY_ROUTE=$(oc get route/trustyai-service --template="https://{{.spec.host}}")
# WIPAs the Model Registry is still Tech Preview, we still keep documentation about how to sync manually models using an OCP Job and then serve it with OpenShift AI. You can use the following Application that points to a Helm Chart that automates it:
oc apply -f application-serve-mistral-7b.yaml
sleep 4
oc create secret generic hf-creds --from-env-file=hf-creds -n mistral-7boc apply -f application-serve-granite-1b-a400m.yaml
sleep 4
oc create secret generic hf-creds --from-env-file=hf-creds -n granite-1b-a400moc apply -f application-serve-nomic-embed-text-v1.yaml
sleep 4
oc create secret generic hf-creds --from-env-file=hf-creds -n nomic-embed-text-v1# Retrieve certificates
openssl s_client -showcerts -connect mistral-7b.mistral-7b.svc.cluster.local:443 </dev/null
# Check models endpoint
curl --cacert /etc/pki/ca-trust/source/anchors/service-ca.crt https://mistral-7b.mistral-7b.svc.cluster.local:443/v1/models
# Check Completion (It might be /v1/chat/completions)
curl -s -X 'POST' https://mistral-7b.mistral-7b.svc.cluster.local/v1/completions -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{"model": "mistral-7b","prompt": "San Francisco is a"}'
# Embeddings
curl -s -X 'POST' https://mistral-7b.mistral-7b.svc.cluster.local/v1/completions -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{"model": "mistral-7b","prompt": "San Francisco is a"}'curl -s -X 'POST' \
"https://nomic-embed-text-v1.nomic-embed-text-v1.svc.cluster.local/v1/embeddings" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "nomic-embed-text-v1",
"input": ["En un lugar de la Mancha..."]
}'
# API Endpoints:
# * Ollama => https://nomic-embed-text-v1.nomic-embed-text-v1.svc.cluster.local/api/embed
# * OpenAI => https://nomic-embed-text-v1.nomic-embed-text-v1.svc.cluster.local/embeddings|
💡
|
This section is already fully automated in the GitOps deployment during the auto-install.sh, but if you need to deploy it manually, you can follow the steps from this section.
|
This section will guide you on how we are deploying ODF to provide internal S3 storage on our cluster.
|
|
Make sure to have at least three worker nodes!! |
-
Install the ODF operator.
oc apply -k ocp-odf/odf-operator
-
Install the ODF cluster
oc apply -f ocp-odf/storagecluster-ocs-storagecluster.yaml
-
Install RadosGW to provide S3 storage based on Ceph on OCP clusters deployed on Cloud Providers:
oc apply -k ocp-odf/radosgw
This workshop guide is a good read to understand the RadosGW configuration.
|
ℹ️
|
If you want to test your ODF deployment, not with a real use-case, but with a funny example, >> Click Here << |
Let’s now test our configuration and create a bucket to store a model in ODF.
-
Create a bucket:
oc apply -k ocp-odf/rhoai-models
-
Create a secret with the credentials
oc create secret generic hf-creds --from-env-file=hf-creds -n rhoai-models
You just need to retrieve the credentials to the bucket and point to the bucket route url:
export AWS_ACCESS_KEY_ID=$(oc get secret models -n rhoai-models -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 --decode)
export AWS_SECRET_ACCESS_KEY=$(oc get secret models -n rhoai-models -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 --decode)
export BUCKET_HOST=$(oc get route s3-rgw -n openshift-storage --template='{{ .spec.host }}')
export BUCKET_PORT=$(oc get configmap models -n rhoai-models -o jsonpath='{.data.BUCKET_PORT}')
export BUCKET_NAME="models"
export MODEL_NAME="ibm-granite/granite-3.0-1b-a400m-instruct"And then execute normal aws-cli commands against the bucket:
aws s3 ls s3://${BUCKET_NAME}/$MODEL_NAME/ --endpoint-url http://$BUCKET_HOST:$BUCKET_PORTRed Hat OpenShift Lightspeed is a generative AI-powered virtual assistant for OpenShift Container Platform. Lightspeed functionality uses a natural-language interface in the OpenShift web console.
oc apply -f application-ocp-lightspeed.yamlor you can deploy it manually with the following command:
oc apply -k components/ocp-lightspeedThe Llama Stack Playground is an application that allows you to test the Llama Stack LLM and Agent capabilities.
cat application-lls-playground.yaml | \
CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
LLS_ENDPOINT="http://llama-stack-service.intelligent-cd.svc.cluster.local:8321" \
envsubst | oc apply -f -or you can deploy it manually with the following command:
helm template components/lls-playground \
--set global.clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
--set llamaStack.endpoint="http://llama-stack-service.intelligent-cd.svc.cluster.local:8321" \
| oc apply -f -This demo is fully oriented to use the default and production ready capabilities provided by OpenShift. However, if your current deployment already uses minio and you cannot change it, you can optionally deploy a MinIO application in a side namespace using the following ArgoCD application. This application is included in the auto-install.sh automation:
cat application-minio.yaml | \
CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
MINIO_NAMESPACE="minio" MINIO_SERVICE_NAME="minio" \
MINIO_ADMIN_USERNAME="minio" MINIO_ADMIN_PASSWORD="minio123" \
envsubst | oc apply -f -or you can deploy it manually with the following command:
helm template components/minio \
--set clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
--set namespace="minio" --set service.name="minio" \
--set adminUser.username="minio" --set adminUser.password="minio123" | oc apply -f -User and password is minio / minio123.
Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners like Ollama and OpenAI-compatible APIs, with built-in inference engine for RAG, making it a powerful AI deployment solution.
cat application-open-webui.yaml | \
CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
LLM_INFERENCE_SERVICE_URL="https://mistral-7b.mistral-7b.svc.cluster.local/v1" \
envsubst | oc apply -f -or you can deploy it manually with the following command:
helm template components/open-webui --namespace="open-webui" \
--set llmInferenceService.url="https://mistral-7b.mistral-7b.svc.cluster.local/v1" \
--set clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
--set rag.enabled="true" | oc apply -f -Milvus is Vector database built for scalable similarity search. It is "Open-source, highly scalable, and blazing fast". Milvus offers robust data modeling capabilities, enabling you to organize your unstructured or multi-modal data into structured collections.
Attu is an efficient open-source management tool for Milvus. It features an intuitive graphical user interface (GUI), allowing you to easily interact with your databases.
cat application-milvus.yaml | \
CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
envsubst | oc apply -f -or you can deploy it manually with the following command:
helm template components/milvus --namespace="milvus" \
--set clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') | oc apply -f -The password for the Attu GUI is root / Milvus.
-
https://redhatquickcourses.github.io/rhods-admin/rhods-admin/1.33
-
https://redhatquickcourses.github.io/rhods-intro/rhods-intro/1.33
-
https://redhatquickcourses.github.io/rhods-model/rhods-model/1.33
-
https://rh-aiservices-bu.github.io/insurance-claim-processing/modules/02-03-creating-workbench.html
-
https://developers.redhat.com/products/red-hat-openshift-ai/getting-started
|
|
Logging before InitGoogleLogging() is written to STDERR E0223 23:48:09.323659 1 mysql_metadata_source.cc:174] MySQL database was not initialized. Please ensure your MySQL server is running. Also, this error might be caused by starting from MySQL 8.0, mysql_native_password used by MLMD is not supported as a default for authentication plugin. Please follow https://dev.mysql.com/blog-archive/upgrading-to-mysql-8-0-default-authentication-plugin-considerations/to fix this issue. F0223 23:48:09.323763 1 metadata_store_server_main.cc:617] Check failed: absl::OkStatus() == status (OK vs. INTERNAL: mysql_real_connect failed: errno: , error: [mysql-error-info='']) MetadataStore cannot be created with the given connection config. * Check failure stack trace: * |