-
Worked at Kuaishou, Baidu, Meituan
- Beijing
- https://ageliss.github.io/gqjiang/
Starred repositories
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
slime is an LLM post-training framework for RL Scaling.
Adamas: Hadamard Sparse Attention for Efficient Long-context Inference
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI.
Multilingual Document Layout Parsing in a Single Vision-Language Model
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
Kimi K2 is the large language model series developed by Moonshot AI team
The official repository of the dots.llm1 base and instruct models proposed by rednote-hilab.
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
Scalable toolkit for efficient model reinforcement
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
verl: Volcano Engine Reinforcement Learning for LLMs
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
HArmonizedSS / HASS
Forked from SafeAILab/EAGLEOfficial Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)
No fortress, purely open ground. OpenManus is Coming.
Supercharge Your LLM with the Fastest KV Cache Layer
Tile primitives for speedy kernels
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashMLA: Efficient Multi-head Latent Attention Kernels
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/