Lists (8)
Sort Name ascending (A-Z)
Starred repositories
The absolute trainer to light up AI agents.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
This is the official repository for our recent work: PIDNet
The official implementation of "Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes"
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
A framework for few-shot evaluation of language models.
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
SGLang is a fast serving framework for large language models and vision language models.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
Official PyTorch implementation for "Large Language Diffusion Models"
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
"AutoAgent: Fully-Automated and Zero-Code LLM Agent Framework"
[NeurIPS2025] "AI-Researcher: Autonomous Scientific Innovation" -- A production-ready version: https://novix.science/chat
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
🧮 Calculator for vision tokens in VLMs.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
A high-throughput and memory-efficient inference and serving engine for LLMs
Awesome-Paper-list: Visualization meets LLM
An AI-powered task-management system you can drop into Cursor, Lovable, Windsurf, Roo, and others.
A powerful AI coding agent. Built for the terminal.
Code-MCP: Connect Claude AI to your development environment through the Model Context Protocol (MCP), enabling terminal commands and file operations through the AI interface.
Machine Learning Engineering Open Book
[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"
Model Context Protocol Servers