yukavio

Follow

KavioYu yukavio

Follow

Work for Tencent-WXG. Focus on model inference optimization, such as inference engine and model compression.

24 followers · 2 following

Tencent
Shanghai

Achievements

Achievements

Pinned Loading

sglang sglang Public

Forked from sgl-project/sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Python 2 1
vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python
flashinfer flashinfer Public

Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda
nsa nsa Public

native sparse attention kernel

Python 6 2
flash-attention flash-attention Public

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python