Stars
Community maintained hardware plugin for vLLM on Intel Gaudi
Helper scripts to install pip, in a Python installation that doesn't have it.
red-hat-data-services / vllm
Forked from opendatahub-io/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Supercharge Your LLM with the Fastest KV Cache Layer
Source for "Neural Magic Workshop: Hands-On AI Optimization with OpenShift AI" Lab
This repository serves as a comprehensive collection of code examples, research papers, and practical resources from my Generative AI (GenAI) series published on MLWhiz.
Dynamically route user prompts to LoRA adapters or a base LLM using semantic evaluation on Red Hat OpenShift AI with LiteLLM and vLLM.
Resources, demos, recipes,... to work with LLMs on OpenShift with OpenShift AI or Open Data Hub.
Intel® Gaudi® Software is an implementation of the runtime and graph compiler for Gaudi3
The image registry operator installs+maintains the internal registry on a cluster
Model Context Protocol Servers
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality
Tools and pipelines for automated LLM performance evaluation
rhai-code / GenAIInfra
Forked from edlee123/GenAIInfraContainerization and cloud native suite for OPEA
edlee123 / GenAIInfra
Forked from opea-project/GenAIInfraContainerization and cloud native suite for OPEA
This repository contains all the helm charts to deploy LLM service, Llama Stack server, configuring pipeline server, minio, pgvector
Intel® AI Assistant Builder
With OpenVINO Test Drive, users can run large language models (LLMs) and models trained by Intel Geti on their devices, including AI PCs and Edge devices.
Enable RHOAI User Workload Metrics for Single Serving Models
This repository contains Dockerfiles, scripts, yaml files, Helm charts, etc. used to scale out AI containers with versions of TensorFlow and PyTorch that have been optimized for Intel platforms. Sc…