Stars
All-in-One Sandbox for AI Agents that combines Browser, Shell, File, MCP and VSCode Server in a single Docker container.
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
DeepEvolve is a research and coding agent for new algorithm discovery in different science domains with Deep Research and AlphaEvolve.
OceanGym: A Benchmark Environment for Underwater Embodied Agents
pathlib api extended to use fsspec backends
Renderer for the harmony response format to be used with gpt-oss
DeepResearchAgent is a hierarchical multi-agent system designed not only for deep research tasks but also for general-purpose task solving. The framework leverages a top-level planning agent to coo…
Build, evaluate and train General Multi-Agent Assistance with ease
本仓库包含对 Claude Code v1.0.33 进行逆向工程的完整研究和分析资料。包括对混淆源代码的深度技术分析、系统架构文档,以及重构 Claude Code agent 系统的实现蓝图。主要发现包括实时 Steering 机制、多 Agent 架构、智能上下文管理和工具执行管道。该项目为理解现代 AI agent 系统设计和实现提供技术参考。
Democratizing Reinforcement Learning for LLMs
What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?
verl: Volcano Engine Reinforcement Learning for LLMs
Development environments for coding agents. Enable multiple agents to work safely and independently with your preferred stack.
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA (+ more DSLs)
Pocket Flow: 100-line LLM framework. Let Agents build Agents!
LookAhead Tuning: Safer Language Models via Partial Answer Previews
[NeurIPS2025] "AI-Researcher: Autonomous Scientific Innovation" -- A production-ready version: https://novix.science/chat
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
A lightweight, powerful framework for multi-agent workflows
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
MLGym A New Framework and Benchmark for Advancing AI Research Agents
Automatically update arXiv papers about LLM Reasoning, LLM Evaluation, LLM & MLLM and Video Understanding using Github Actions.
This is the reading list of Large Language Model-Based Data Science Agent
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge ba…
[TMLR 2024] Efficient Large Language Models: A Survey