Red Hat OpenShift AI

Red Hat OpenShift AI (RHOAI) builds on the capabilities of Red Hat OpenShift to provide a single, consistent, enterprise-ready hybrid AI and MLOps platform. It provides tools across the full lifecycle of AI/ML experiments and models including training, serving, monitoring, and managing AI/ML models and AI-enabled applications. This is my personal repository to test and play with some of its most important features.

Table of Contents

1. Red Hat Training

RHOAI is a product under continuous improvement, so this repo will be outdated at some point in time. I recommend you to refer to the Official documentation to check the latest features or you can try the official trainings.

Red Hat OpenShift AI (RHOAI) is a platform for data scientists, AI practitioners, developers, machine learning engineers, and operations teams to prototype, build, deploy, and monitor AI models. This is a wide variety of audience that needs different kinds of training. For that reason, there are several courses that will help you to understand RHOAI from all angles:

AI262 - Introduction to Red Hat OpenShift AI: About configuring Data Science Projects and Jupyter Notebooks.
AI263 - Red Hat OpenShift AI Administration: About installing RHOAI, configuring users and permissions and creating Custom Notebook Images.
AI264 - Creating Machine Learning Models with Red Hat OpenShift AI: About training models and enhancing the model training.
AI265 - Deploying Machine Learning Models with Red Hat OpenShift AI: About serving models on RHOAI.
AI266 - Automating AI/ML Workflows with Red Hat OpenShift AI: About creating Data Science Pipelines, and Elyra and Kubeflow Pipelines.
AI267 - Developing and Deploying AI/ML Applications on Red Hat OpenShift AI: All the previous courses altogether.

2. RHOAI Architecture

The following diagram depicts the general architecture of a RHOAI deployment, including the most important components:

Figure 1. RHOAI Architecture

codeflare: Codeflare is an IBM software stack for developing and scaling machine-learning and Python workloads. It uses and needs the Ray component.
dashboard: Provides the RHOAI dashboard.
datasciencepipelines: This enables you to build portable machine learning workflows. It is based on Argo Workflows and you don’t need to install OCP Pipelines operator.
kserve: RHOAI uses Kserve to serve large language models that can scale based on demand. Requires the OpenShift Serverless and the OpenShift Service Mesh operators to be present before enabling the component. Does not support enabled ModelMeshServing at the same time.
kueue: Kueue component configuration. It is not yet in Technology Preview
modelmeshserving: KServe also offers a component for general-purpose model serving, called ModelMesh Mesh Serving. Activate this component to serve small and medium size models. Does not support enabled Kserve at the same time.
ray: Component to run the data science code in a distributed manner.
workbenches: Workbenches are containerized and isolated working environments for data scientists to examine data and work with data models. Data scientists can create workbenches from an existing notebook container image to access its resources and properties. Workbenches are associated to container storage to prevent data loss when the workbench container is restarted or deleted.

3. Installation

Installing RHOAI is not as simple as installing and configuring other operators on OpenShift. This product provides integration with hardware like NVIDIA and Intel GPUs, automation of ML workflows and AI training, and deployment of LLMs. For that reason, I’ve created an auto-install.sh script that will do everything for you:

If the installation is IPI AWS, it will create MachineSets for nodes with NVIDIA GPUs (Currently, g5.4xlarge).
Install all the operators that RHOAI depends on:
- Service Mesh and Serverless to enable KServe and allow Single-Model serving platform.
- Node Feature Discovery and Nvidia GPU Operator to discover and configure nodes with GPU.
- Authorino, to enable token authorization for models deployed with RHOAI.
Install and configure OpenShift Data Foundation (ODF) in Multicloud Object Gateway (MCG) mode. This is a lightweight alternative that allows us to use the AWS S3 object storage the same way that we will then use Object storage on Baremetal using ODF.
Installs the actual RHOAI operator and configures the installation with some defaults, enabling NVIDIA acceleration and Single-Model Serving.
Deploys a new Data Science Project called RHOAI Playground enabling pipelines and deploying a basic Notebook for testing.

3.1. Installation on non-4.18 OCP

Some of the components deployed in this repo are bound to an specific version of OpenShift. If you want to deploy RHOAI on an older version (For example 4.17), you have to make the following modifications:

Change the image for the Node Feature Discovery container to the one for 4.17:
- In ./rhoai-dependencies/operator-nfd/nodefeaturediscovery-nfd-instance.yaml, the .spec.operand.image field should have value registry.redhat.io/openshift4/ose-node-feature-discovery-rhel9:v4.17.
Change the channel of ODF:
- In ./ocp-odf/odf-operator/sub-odf-operator.yaml, the value of .spec.channel field should be stable-4.17.

3.2. Let’s install!!

💡	💡 Tip 💡 The script contains many tasks divided in clear blocks with comments. Use the Environment Variables or add comments to disable those that you are not interested in.

In order to automate it all, it relays on OpenShift GitOps (ArgoCD), so you will to have it installed before executing the following script. Check out my automated installation on alvarolop/ocp-gitops-playground GitHub repository.

Now, log in to the cluster and just execute the script:

./auto-install.sh

4. Things you should know!

4.1. NVIDIA GPU nodes

Most of the activities related to RHOAI will require GPU Acceleration. For that purpose, we add NVIDIA GPU nodes during the installation process. In this chapter, I collect some information that might be useful for you.

In this automation, we are currently using the AWS g5.2xlarge instance, that according to the documentation:

Amazon EC2 G5 instances are designed to accelerate graphics-intensive applications and machine learning inference. They can also be used to train simple to moderately complex machine learning models.

How to know that a node has NVIDIA GPUs using NodeFeatureDiscovery?

The output of the following command will only be visible when you have applied the ArgoCD Application and the Node Feature Discovery operator has scanned the OpenShift nodes:

oc describe node | egrep 'Roles|pci'
Roles:              control-plane,master
Roles:              worker
                    feature.node.kubernetes.io/pci-1d0f.present=true
Roles:              gpu-worker,worker
                    feature.node.kubernetes.io/pci-10de.present=true
                    feature.node.kubernetes.io/pci-1d0f.present=true
Roles:              control-plane,master
Roles:              control-plane,master

pci-10de is the PCI vendor ID that is assigned to NVIDIA.

The NVIDIA GPU Operator automates the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM based monitoring and others.

After configuring the Node Feature Discovery Operator and the NVidia GPU Operator using GitOps, you need to confirm that the Nvidia operator is correctly retrieving the GPU information. You can use the following command to confirm that OpenShift is correctly configured:

oc exec -it -n nvidia-gpu-operator $(oc get pod -o wide -l openshift.driver-toolkit=true -o jsonpath="{.items[0].metadata.name}" -n nvidia-gpu-operator) -- nvidia-smi

The output should look like this:

Sat Oct 26 08:47:06 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A10G                    On  |   00000000:00:1E.0 Off |                    0 |
|  0%   25C    P8             22W /  300W |       1MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

If, for some race condition, RHOAI is not detecting that GPU worker, you might need to force it to recalculate. You can do so easily with the following command:

oc delete cm migration-gpu-status -n redhat-ods-applications; sleep 3; oc delete pods -l app=rhods-dashboard -n redhat-ods-applications

Wait for a few seconds until the dashboard pods start again and you will see in the RHOAI web console that now the NVidia GPU Accelerator Profile is listed.

4.2. NVIDIA GPU Partitioning

❗	If you want to achieve this properly, please, don’t miss reading this repo.

Partitioning allows for flexibility in resource management, enabling multiple applications to share a single GPU or dividing a large GPU into smaller, dedicated units for different tasks. For the sake of simplicity and maximization of the reduced resources, I have enabled time-slicing configuration. You can check the configuration in rhoai-dependencies/operator-nvidia-gpu.

How to check that the configuration is applied?

oc get node --selector=nvidia.com/gpu.product="NVIDIA-A10G-SHARED" -o json  | jq '.items[0].metadata.labels' | grep nvidia

Also, you can check these two blog entries with an analysis from the RH Performance team about this topic:

4.3. Data Connection Pipelines S3 Bucket Secret

The DataSciencePipelineApplication requires an S3-compatible storage solution to store artifacts that are generated in the pipeline. You can use any S3-compatible storage solution for data science pipelines, including AWS S3, OpenShift Data Foundation, or MinIO. The automation is currently using ODF with Nooba to interact with the AWS S3 interface, so you won’t need to do anything. Nevertheless, if you decide to disable ODF, you will need to create buckets on AWS S3 manually and for that you will need the following process:

Define the configuration variables for AWS is a file dubbed aws-env-vars. You can use the same structure as in aws-env-vars.example
Execute the following command to interact with the AWS API:
```
./prerequisites/s3-bucket/create-aws-s3-bucket.sh
```
Or execute the following command if you interact with MinIO:
```
./prerequisites/s3-bucket/create-minio-s3-bucket.sh
```

4.4. Reusing Router Certificates

ℹ️	This is already included in the automation, so you don’t have to do anything with this section.

By default, the Single Stack Serving in Openshift AI uses a self-signed certificate generated at installation for the endpoints that are created when deploying a server. This can be counter-intuitive because if you already have certificates configured on your OpenShift cluster, they will be used by default for other types of endpoints like Routes.

See the following blog entry to understand what is done in the automation.

4.5. RHOAI Experiments

This setup includes a preconfigured OpenShift project designed as a RHOAI Data Science Project. Explore and experiment with prebuilt pipelines to unlock powerful data analysis capabilities. Dive into RHOAI Pipelines and Experiments by clicking the button below:

Explore RHOAI Experiments

4.6. Managing distributed workloads

You can use the distributed workloads feature to queue, scale, and manage the resources required to run data science workloads across multiple nodes in an OpenShift cluster simultaneously. These three components need to be enabled on the RHOAI installation configuration:

CodeFlare: Secures deployed Ray clusters and grants access to their URLs.
KubeRay: Manages remote Ray clusters on OpenShift for running distributed compute workloads.
Kueue: Manages quotas and how distributed workloads consume them, and manages the queueing of distributed workloads with respect to quotas.

If you want to try this feature, I recommend you to follow the RH documentation, which points to the following Guided Demos.

Documentation: Installation guide.
Documentation: Configuration guide.
Documentation: Usage guide.

After everything is configured, you can use the Model Tunning example from the Helm chart to see some stats:

helm template ./rhoai-environment-chart \
    -s templates/modelTunning/cm-training-config.yaml \
    -s templates/modelTunning/cm-twitter-complaints.yaml \
    -s templates/modelTunning/pvc-trained-model.yaml \
    -s templates/modelTunning/pytorchjob-demo.yaml \
    --set modelTunning.enabled=true | oc apply -f -

You can also see some stats from the RHOAI dasboard:

4.7. Model Registry

OpenShift AI now includes the possibility to deploy a model registry to store community and customized AI models. This model registry uses a mysql database as backend to store metadata and artifacts from your applications. Once deployed, your training pipelines can add an extra step putting model metadata to the registry.

Using RHOAI Model Registry you have a centralized source of models as well as a simple way to deploy prepared models:

Here you can find examples of REST requests to query model metadata:

MODEL_REGISTRY_NAME=default
MODEL_REGISTRY_HOST=$(oc get route default-https -n rhoai-model-registries -o go-template='https://{{.spec.host}}')
TOKEN=$(oc whoami -t)

# List models
curl -s "$MODEL_REGISTRY_HOST/api/model_registry/v1alpha3/registered_models?pageSize=100&orderBy=ID&sortOrder=DESC" \
  -H "accept: application/json" \
  -H "Authorization: Bearer ${TOKEN}" | jq .

# List all model versions
MODEL_NAME="test"
MODEL_ID="4"

curl -s "$MODEL_REGISTRY_HOST/api/model_registry/v1alpha3/registered_model?name=${MODEL_NAME}&externalId=${MODEL_ID}" \
  -H "accept: application/json" \
  -H "Authorization: Bearer ${TOKEN}" | jq .

curl -s "$MODEL_REGISTRY_HOST/api/model_registry/v1alpha/registered_models/${MODEL_ID}/versions?name=${MODEL_NAME}&pageSize=100&orderBy=ID&sortOrder=DESC" \
  -H "accept: application/json" \
  -H "Authorization: Bearer ${TOKEN}" | jq .

If you want to try this feature, I recommend you to follow the RH documentation:

Documentation step 1: Configuring the model registry component.
Documentation step 2: Managing Model Registries.
Documentation step 3: Workign with Model Registries.

4.8. Monitoring models with TrustyAI

To ensure that machine-learning models are transparent, fair, and reliable, data scientists can use TrustyAI in OpenShift AI to monitor their data science models. Data scientists can monitor their data science models in OpenShift AI for Bias and Data Drift.

TRUSTY_ROUTE=$(oc get route/trustyai-service --template="https://{{.spec.host}}")

# WIP

5. Deploying an Inference Server

As the Model Registry is still Tech Preview, we still keep documentation about how to sync manually models using an OCP Job and then serve it with OpenShift AI. You can use the following Application that points to a Helm Chart that automates it:

mistral-7b

oc apply -f application-serve-mistral-7b.yaml
sleep 4
oc create secret generic hf-creds --from-env-file=hf-creds -n mistral-7b

granite-1b-a400m

oc apply -f application-serve-granite-1b-a400m.yaml
sleep 4
oc create secret generic hf-creds --from-env-file=hf-creds -n granite-1b-a400m

nomic-embed-text-v1

oc apply -f application-serve-nomic-embed-text-v1.yaml
sleep 4
oc create secret generic hf-creds --from-env-file=hf-creds -n nomic-embed-text-v1

Testing LLM certificates

# Retrieve certificates
openssl s_client -showcerts -connect mistral-7b.mistral-7b.svc.cluster.local:443 </dev/null

# Check models endpoint
curl --cacert /etc/pki/ca-trust/source/anchors/service-ca.crt https://mistral-7b.mistral-7b.svc.cluster.local:443/v1/models

# Check Completion (It might be /v1/chat/completions)
curl -s -X 'POST' https://mistral-7b.mistral-7b.svc.cluster.local/v1/completions -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{"model": "mistral-7b","prompt": "San Francisco is a"}'

# Embeddings
curl -s -X 'POST' https://mistral-7b.mistral-7b.svc.cluster.local/v1/completions -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{"model": "mistral-7b","prompt": "San Francisco is a"}'

Embeddings

curl -s -X 'POST' \
  "https://nomic-embed-text-v1.nomic-embed-text-v1.svc.cluster.local/v1/embeddings" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "nomic-embed-text-v1",
  "input": ["En un lugar de la Mancha..."]
}'

# API Endpoints:
# * Ollama => https://nomic-embed-text-v1.nomic-embed-text-v1.svc.cluster.local/api/embed
# * OpenAI => https://nomic-embed-text-v1.nomic-embed-text-v1.svc.cluster.local/embeddings

6. OpenShift Data Foundation

💡	This section is already fully automated in the GitOps deployment during the `auto-install.sh`, but if you need to deploy it manually, you can follow the steps from this section.

6.1. ODF Installation and configuration

This section will guide you on how we are deploying ODF to provide internal S3 storage on our cluster.

⚠️	Make sure to have at least three worker nodes!!

Install the ODF operator.
```
oc apply -k ocp-odf/odf-operator
```

Install the ODF cluster

oc apply -f ocp-odf/storagecluster-ocs-storagecluster.yaml

Install RadosGW to provide S3 storage based on Ceph on OCP clusters deployed on Cloud Providers:
```
oc apply -k ocp-odf/radosgw
```

This workshop guide is a good read to understand the RadosGW configuration.

ℹ️	If you want to test your ODF deployment, not with a real use-case, but with a funny example, >> Click Here <<

6.2. ODF S3 configuration and testing

Let’s now test our configuration and create a bucket to store a model in ODF.

Create a bucket:
```
oc apply -k ocp-odf/rhoai-models
```

Create a secret with the credentials

oc create secret generic hf-creds --from-env-file=hf-creds -n rhoai-models

Wanna check the status from your laptop?

You just need to retrieve the credentials to the bucket and point to the bucket route url:

export AWS_ACCESS_KEY_ID=$(oc get secret models -n rhoai-models -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 --decode)
export AWS_SECRET_ACCESS_KEY=$(oc get secret models -n rhoai-models -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 --decode)
export BUCKET_HOST=$(oc get route s3-rgw -n openshift-storage --template='{{ .spec.host }}')
export BUCKET_PORT=$(oc get configmap models -n rhoai-models -o jsonpath='{.data.BUCKET_PORT}')
export BUCKET_NAME="models"
export MODEL_NAME="ibm-granite/granite-3.0-1b-a400m-instruct"

And then execute normal aws-cli commands against the bucket:

aws s3 ls s3://${BUCKET_NAME}/$MODEL_NAME/ --endpoint-url http://$BUCKET_HOST:$BUCKET_PORT

7. Extra Components

7.1. OpenShift Lightspeed

Red Hat OpenShift Lightspeed is a generative AI-powered virtual assistant for OpenShift Container Platform. Lightspeed functionality uses a natural-language interface in the OpenShift web console.

oc apply -f application-ocp-lightspeed.yaml

or you can deploy it manually with the following command:

oc apply -k components/ocp-lightspeed

7.2. LLS Playground

The Llama Stack Playground is an application that allows you to test the Llama Stack LLM and Agent capabilities.

cat application-lls-playground.yaml | \
  CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
  LLS_ENDPOINT="http://llama-stack-service.intelligent-cd.svc.cluster.local:8321" \
  envsubst | oc apply -f -

or you can deploy it manually with the following command:

helm template components/lls-playground \
  --set global.clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
  --set llamaStack.endpoint="http://llama-stack-service.intelligent-cd.svc.cluster.local:8321" \
  | oc apply -f -

7.3. MinIO

This demo is fully oriented to use the default and production ready capabilities provided by OpenShift. However, if your current deployment already uses minio and you cannot change it, you can optionally deploy a MinIO application in a side namespace using the following ArgoCD application. This application is included in the auto-install.sh automation:

cat application-minio.yaml | \
    CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
    MINIO_NAMESPACE="minio" MINIO_SERVICE_NAME="minio" \
    MINIO_ADMIN_USERNAME="minio" MINIO_ADMIN_PASSWORD="minio123" \
    envsubst | oc apply -f -

or you can deploy it manually with the following command:

helm template components/minio \
    --set clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
    --set namespace="minio" --set service.name="minio" \
    --set adminUser.username="minio" --set adminUser.password="minio123" | oc apply -f -

User and password is minio / minio123.

7.4. Open WebUI

Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners like Ollama and OpenAI-compatible APIs, with built-in inference engine for RAG, making it a powerful AI deployment solution.

Source. Could be nice to adapt to the official one.

cat application-open-webui.yaml | \
    CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
    LLM_INFERENCE_SERVICE_URL="https://mistral-7b.mistral-7b.svc.cluster.local/v1" \
    envsubst | oc apply -f -

or you can deploy it manually with the following command:

helm template components/open-webui --namespace="open-webui" \
    --set llmInferenceService.url="https://mistral-7b.mistral-7b.svc.cluster.local/v1" \
    --set clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
    --set rag.enabled="true" | oc apply -f -

7.5. Milvus

Milvus is Vector database built for scalable similarity search. It is "Open-source, highly scalable, and blazing fast". Milvus offers robust data modeling capabilities, enabling you to organize your unstructured or multi-modal data into structured collections.

Attu is an efficient open-source management tool for Milvus. It features an intuitive graphical user interface (GUI), allowing you to easily interact with your databases.

Source

cat application-milvus.yaml | \
    CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
    envsubst | oc apply -f -

or you can deploy it manually with the following command:

helm template components/milvus --namespace="milvus" \
    --set clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') | oc apply -f -

The password for the Attu GUI is root / Milvus.

8. Extra documentation

8.1. Useful links

8.2. Study sources

8.3. Nice demos and projects to take a look:

⚠️

Logging before InitGoogleLogging() is written to STDERR E0223 23:48:09.323659 1 mysql_metadata_source.cc:174] MySQL database was not initialized. Please ensure your MySQL server is running. Also, this error might be caused by starting from MySQL 8.0, mysql_native_password used by MLMD is not supported as a default for authentication plugin. Please follow https://dev.mysql.com/blog-archive/upgrading-to-mysql-8-0-default-authentication-plugin-considerations/to fix this issue. F0223 23:48:09.323763 1 metadata_store_server_main.cc:617] Check failed: absl::OkStatus() == status (OK vs. INTERNAL: mysql_real_connect failed: errno: , error: [mysql-error-info='']) MetadataStore cannot be created with the given connection config. * Check failure stack trace: *

Name		Name	Last commit message	Last commit date
Latest commit History 273 Commits
.github/workflows		.github/workflows
components		components
dockerfiles/hf-cli		dockerfiles/hf-cli
docs/images		docs/images
notebooks/pytorch-extended		notebooks/pytorch-extended
ocp-odf		ocp-odf
ocp-pipelines		ocp-pipelines
prerequisites		prerequisites
rhoai-dependencies		rhoai-dependencies
rhoai-environment-chart		rhoai-environment-chart
rhoai-experiments		rhoai-experiments
rhoai-installation-chart		rhoai-installation-chart
.gitignore		.gitignore
LICENSE		LICENSE
README.adoc		README.adoc
application-console-plugin-nvidia-gpu.yaml		application-console-plugin-nvidia-gpu.yaml
application-lls-playground.yaml		application-lls-playground.yaml
application-milvus.yaml		application-milvus.yaml
application-minio.yaml		application-minio.yaml
application-ocp-lightspeed.yaml		application-ocp-lightspeed.yaml
application-ocp-odf.yaml		application-ocp-odf.yaml
application-open-webui.yaml		application-open-webui.yaml
application-rhoai-dependencies.yaml		application-rhoai-dependencies.yaml
application-rhoai-installation.yaml		application-rhoai-installation.yaml
application-rhoai-playground-env.yaml		application-rhoai-playground-env.yaml
application-serve-granite-1b-a400m.yaml		application-serve-granite-1b-a400m.yaml
application-serve-mistral-7b.yaml		application-serve-mistral-7b.yaml
application-serve-nomic-embed-text-v1.yaml		application-serve-nomic-embed-text-v1.yaml
auto-install.sh		auto-install.sh
aws-env-vars.example		aws-env-vars.example
hf-creds.example		hf-creds.example
notes-on-distributed-workloads.ipynb		notes-on-distributed-workloads.ipynb

License

alvarolop/rhoai-gitops

Folders and files

Latest commit

History

Repository files navigation

Red Hat OpenShift AI

1. Red Hat Training

2. RHOAI Architecture

3. Installation

3.1. Installation on non-4.18 OCP

3.2. Let’s install!!

4. Things you should know!

4.1. NVIDIA GPU nodes

How to know that a node has NVIDIA GPUs using NodeFeatureDiscovery?

4.2. NVIDIA GPU Partitioning

4.3. Data Connection Pipelines S3 Bucket Secret

4.4. Reusing Router Certificates

4.5. RHOAI Experiments

4.6. Managing distributed workloads

4.7. Model Registry

4.8. Monitoring models with TrustyAI

5. Deploying an Inference Server

6. OpenShift Data Foundation

6.1. ODF Installation and configuration

6.2. ODF S3 configuration and testing

Wanna check the status from your laptop?

7. Extra Components

7.1. OpenShift Lightspeed

7.2. LLS Playground

7.3. MinIO

7.4. Open WebUI

7.5. Milvus

8. Extra documentation

8.1. Useful links

8.2. Study sources

8.3. Nice demos and projects to take a look:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages