Skip to content
View gty111's full-sized avatar
🎯
Focusing is all you need
🎯
Focusing is all you need

Highlights

  • Pro

Block or report gty111

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 6,502 716 Updated Dec 30, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 602 127 Updated Jan 1, 2026

Extremely fast non-cryptographic hash algorithm

C 10,677 874 Updated Dec 17, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,517 467 Updated Dec 31, 2025

所有小初高、大学PDF教材。

Roff 63,605 14,137 Updated Oct 18, 2025
Python 6 1 Updated Oct 23, 2025

gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

Python 52 4 Updated Dec 30, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,948 924 Updated Dec 15, 2025
Python 4,477 433 Updated Sep 14, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,466 640 Updated Dec 31, 2025

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,451 233 Updated Nov 12, 2025

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Go 4,958 1,330 Updated Dec 30, 2025

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,244 273 Updated Dec 19, 2025

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,242 189 Updated Mar 27, 2024

Kimi K2 is the large language model series developed by Moonshot AI team

9,787 715 Updated Nov 7, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,970 2,940 Updated Jan 2, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,025 163 Updated Dec 20, 2025

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 515 37 Updated Feb 10, 2025

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 20,061 1,682 Updated Nov 26, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 6,581 826 Updated Jan 2, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,303 115 Updated Dec 27, 2025

The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Python 116 1 Updated Jul 1, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,983 301 Updated Dec 22, 2025

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 66 7 Updated Sep 15, 2025

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,543 79 Updated Nov 16, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,897 312 Updated Mar 10, 2025

Analyze computation-communication overlap in V3/R1.

1,130 144 Updated Mar 21, 2025

Expert Parallelism Load Balancer

Python 1,327 196 Updated Mar 24, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,853 1,046 Updated Dec 29, 2025

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,298 1,196 Updated Jan 2, 2026
Next