Highlights
- Pro
Stars
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Post-training with Tinker
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Latest Advances on Embodied Multimodal LLMs (or Vison-Language-Action Models).
[NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
[ICML 2025] Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Official Repo for Open-Reasoner-Zero
verl: Volcano Engine Reinforcement Learning for LLMs
Agentlessš±: an agentless approach to automatically solve software development problems
DSPy: The framework for programmingānot promptingālanguage models
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
Minimalistic large language model 3D-parallelism training
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
š¾ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Fully open reproduction of DeepSeek-R1
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
A generative world for general-purpose robotics & embodied AI learning.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct
Entropy Based Sampling and Parallel CoT Decoding