Stars
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
A Robust Approach for LiDAR-Inertial Odometry Without Sensor-Specific Modelling
[ICRA 2025] Interactive4D: Interactive 4D LiDAR Segmentation
The Most Faithful Implementation of Segment Anything (SAM) in 3D
BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI workflow, RAG, Agent, Unified model management, Evaluation,…
verl: Volcano Engine Reinforcement Learning for LLMs
🚀 The fast, Pythonic way to build MCP servers and clients
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
FlashMLA: Efficient Multi-head Latent Attention Kernels
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Deep Learning model implementation for Fire detection both classification and segmentation from the FLAME dataset.
Open source alternative to Gemini Deep Research. Generate reports with AI based on search results.
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Fully open reproduction of DeepSeek-R1
A simple screen parsing tool towards pure vision based GUI agent
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, con…
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"
KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge ba…
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.