Stars
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …
Distributed reliable key-value store for the most critical data of a distributed system
The Triton TensorRT-LLM Backend
Optimize QWen1.5 models with TensorRT-LLM
Retrieval and Retrieval-augmented LLMs
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
Production-ready platform for agentic workflow development.
LLM API 管理 & 分发系统,支持 OpenAI、Azure、Anthropic Claude、Google Gemini、DeepSeek、字节豆包、ChatGLM、文心一言、讯飞星火、通义千问、360 智脑、腾讯混元等主流模型,统一 API 适配,可用于 key 管理与二次分发。单可执行文件,提供 Docker 镜像,一键部署,开箱即用。LLM API management & k…
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, le…
A modern download manager that supports all platforms. Built with Golang and Flutter.
A re-implementation of Meta-Prompt in LangChain for building self-improving agents.
Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
The official Python library for the OpenAI API
fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。
✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Visualizer for neural network, deep learning and machine learning models
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…
A scalable inference server for models optimized with OpenVINO™
A flexible, high-performance serving system for machine learning models
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Connection pool for Go's grpc client with supports connection reuse.