Skip to content
View Ageliss's full-sized avatar

Block or report Ageliss

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 3,093 238 Updated Nov 28, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,613 288 Updated Nov 28, 2025

Adamas: Hadamard Sparse Attention for Efficient Long-context Inference

Cuda 6 1 Updated Nov 25, 2025

Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI.

Python 237 17 Updated Oct 4, 2025
Python 51 4 Updated Oct 28, 2025

Contexts Optical Compression

Python 20,983 1,848 Updated Oct 25, 2025

Multilingual Document Layout Parsing in a Single Vision-Language Model

Python 5,774 575 Updated Oct 31, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,733 271 Updated Nov 28, 2025

Perplexity GPU Kernels

C++ 532 72 Updated Nov 7, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,352 109 Updated Nov 28, 2025

Bridge Megatron-Core to Hugging Face/Reinforcement Learning

Python 164 34 Updated Nov 27, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 500 112 Updated Nov 27, 2025

Kimi K2 is the large language model series developed by Moonshot AI team

9,610 682 Updated Nov 7, 2025

Nano vLLM

Python 9,338 1,150 Updated Nov 3, 2025

The official repository of the dots.llm1 base and instruct models proposed by rednote-hilab.

468 23 Updated Aug 20, 2025

Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"

Python 710 59 Updated Nov 28, 2025

Scalable toolkit for efficient model reinforcement

Python 1,048 173 Updated Nov 28, 2025

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

Python 347 33 Updated May 3, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,760 324 Updated Nov 28, 2025

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 507 36 Updated Feb 10, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 16,812 2,676 Updated Nov 27, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,474 820 Updated Nov 9, 2025

Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)

Python 51 7 Updated Mar 14, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 51,036 8,904 Updated Nov 17, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 6,228 757 Updated Nov 28, 2025

Tile primitives for speedy kernels

Cuda 2,955 203 Updated Nov 28, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,916 757 Updated Nov 25, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,885 905 Updated Sep 30, 2025

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ 490 256 Updated Nov 28, 2025

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,487 688 Updated Nov 27, 2025
Next