-
Tsinghua University
- Haidian, Beijing
Lists (3)
Sort Name ascending (A-Z)
Stars
A collection of paper/projects that trains flow matching model/policies via RL.
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.
This repository summarizes recent advances in the VLA + RL paradigm and provides a taxonomic classification of relevant works.
RLinf / latex2sympy2
Forked from IuvenisSapiens/latex2sympy2Parse LaTeX math expressions
RLinf / LLMEvalKit
Forked from QwenLM/Qwen2.5-MathA lightweight LLM evaluation toolkit for RLinf. Support mathematical reasoning and long CoT tasks.
VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments
Multi-UAV Pursuit-Evasion with Online Planning in Unknown Environments by Deep Reinforcement Learning
What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
Code for paper, "A Comparison of Imitation Learning Algorithms for Bimanual Manipulation" (Drolet et al., 2024)
Reference implementation for DPO (Direct Preference Optimization)
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
This is the official implementation of Multi-Agent PPO (MAPPO).
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
This is a repository for Hidden-utility Self-Play.
Code for "On the Utility of Learning about Humans for Human-AI Coordination"
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Repository Containing the Code associated with the Paper: "Learning High-Speed Flight in the Wild"
aw_nas: A Modularized and Extensible NAS Framework
SLAM algorithms and systems based on Neural Networks.
HongbiaoZ / autonomous_exploration_development_environment
Forked from jizhang-cmu/ground_based_autonomy_basicLeveraging system development and robot deployment for ground-based autonomous navigation and exploration.
MineRL Competition for Sample Efficient Reinforcement Learning - Python Package