-
Tsinghua University
- Beijing China
-
11:47
(UTC +08:00) - yi2100237651[AT]outlook.com
Lists (21)
Sort Name ascending (A-Z)
attention design
awesome-agent
awesome-diffusion
awesome-image-generation
awesome-llm
deep-research-agent
dllm
dynn
efficient reasoning
hhh
inference-engine
latent-reasoning
learning
mlsys
model architecture
reasoning
RL
test-time-scaling
training-infra
training-scheme
vla
Stars
Block Diffusion for Ultra-Fast Speculative Decoding
A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-performance systems.
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
cao1zhg / sglang
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
清华大学云盘 (Tsinghua Cloud) 批量下载助手,适用于分享的文件 size 过大导致无法直接下载的情况,本脚本添加了更多实用的小功能
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
PyTorch building blocks for the OLMo ecosystem
General plug-and-play inference library for Recursive Language Models (RLMs), supporting various sandboxes.
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
Custom cache implementation to fix KV cache bug in ByteDance/Ouro-1.4B
LLMRouter: An Open-Source Library for LLM Routing
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.
[NeurIPS Spotlight 2025] Official implementation of the paper "Controlling Thinking Speed in Reasoning Models"
[arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
[CanadianAI 2025] Code for paper "Intra-Layer Recurrence in Transformers for Language Modeling"
Block-Recurrent Dynamics in ViTs 🦖
LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
Official implementation of "Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought" (NeurIPS 2025)
📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.
repo for paper https://arxiv.org/abs/2504.13837
Accelerating MoE with IO and Tile-aware Optimizations
A minimal yet professional single agent demo project that showcases the core execution pipeline and production-grade features of agents.