Stars
shangshang-wang / Tora
Forked from meta-pytorch/torchtuneTora: Torchtune-LoRA for RL
Seemless interface of using PyTOrch distributed with Jupyter notebooks
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
The glamourous AI coding agent for your favourite terminal 💘
A LLM trained only on data from certain time periods to reduce modern bias
Utils for Unsloth https://github.com/unslothai/unsloth
rl from zero pretrain, can it be done? yes.
Real-time Claude Code usage monitor with predictions and warnings
Development environments for coding agents. Enable multiple agents to work safely and independently with your preferred stack.
SWE-bench: Can Language Models Resolve Real-world Github Issues?
Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other
Detect and redact PII locally with SOTA performance
QwQ is the reasoning model series developed by Qwen team, Alibaba Cloud.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Prisma Client Python is an auto-generated and fully type-safe database client designed for ease of use
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
[NeurIPS 2025 Spotlight] TPA: Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.