gty111

🎯

Focusing is all you need

Tianyu Guo gty111

🎯

Focusing is all you need

Ph.D. student of Sun Yat-Sen University, prior intern @Tencent. Simulators, GPU, architecture, AI Infra, MLSys

139 followers · 113 following

Sun Yat-sen University
Guangzhou
20:08 (UTC +08:00)
https://gty111.github.io/info/
https://orcid.org/0009-0005-2979-4486

Achievements

x3 x2

Achievements

x3 x2

Highlights

Lists (19)

Sort

Stars

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 6,502 716 Updated Dec 30, 2025

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 602 127 Updated Jan 1, 2026

Cyan4973 / xxHash

Extremely fast non-cryptographic hash algorithm

C 10,677 874 Updated Dec 17, 2025

EvolvingLMMs-Lab / lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,517 467 Updated Dec 31, 2025

TapXWorld / ChinaTextbook

所有小初高、大学PDF教材。

Roff 63,605 14,137 Updated Oct 18, 2025

xhx1022 / DynaPipe

Python 6 1 Updated Oct 23, 2025

gty111 / gLLM

gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

Python 52 4 Updated Dec 30, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,948 924 Updated Dec 15, 2025

LLaVA-VL / LLaVA-NeXT

Python 4,477 433 Updated Sep 14, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,466 640 Updated Dec 31, 2025

ML-GSAI / LLaDA

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,451 233 Updated Nov 12, 2025

kserve / kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Go 4,958 1,330 Updated Dec 30, 2025

llm-d / llm-d

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,244 273 Updated Dec 19, 2025

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,242 189 Updated Mar 27, 2024

MoonshotAI / Kimi-K2

Kimi K2 is the large language model series developed by Moonshot AI team

9,787 715 Updated Nov 7, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,970 2,940 Updated Jan 2, 2026

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,025 163 Updated Dec 20, 2025

mit-han-lab / duo-attention

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 515 37 Updated Feb 10, 2025

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 20,061 1,682 Updated Nov 26, 2025

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python 6,581 826 Updated Jan 2, 2026

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,303 115 Updated Dec 27, 2025

HumanMLLM / LLaVA-Scissor

The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Python 116 1 Updated Jul 1, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,983 301 Updated Dec 22, 2025