-
SkyWork
- ChengDu
- www.giantpandacv.com
-
cache-dit Public
Forked from vipshop/cache-dit🤗A PyTorch-native Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs: Z-Image, FLUX2, Qwen-Image, etc.
Python Apache License 2.0 UpdatedJan 13, 2026 -
how to optimize some algorithm in cuda.
-
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
-
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
-
-
tilelang Public
Forked from tile-ai/tilelangDomain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
-
-
-
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedJul 14, 2025 -
Awesome-ML-SYS-Tutorial Public
Forked from zhaochenyang20/Awesome-ML-SYS-TutorialMy learning notes/codes for ML SYS.
-
Panzhihua-Mi-Yi-Pipa Public
If you want to purchase Panzhihua Mi Yi Pipa, please contact me.
-
-
-
DeepGEMM Public
Forked from deepseek-ai/DeepGEMMDeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Cuda MIT License UpdatedFeb 27, 2025 -
ml-engineering Public
Forked from stas00/ml-engineeringMachine Learning Engineering Open Book
-
-
HunyuanVideo Public
Forked from Tencent-Hunyuan/HunyuanVideoHunyuanVideo: A Systematic Framework For Large Video Generation Model
Python Other UpdatedDec 20, 2024 -
-
ao Public
Forked from pytorch/aoPyTorch native quantization and sparsity for training and inference
-
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
-
TiledCUDA Public
Forked from TiledTensor/TiledCUDATiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
C++ MIT License UpdatedSep 6, 2024 -
-
-
-
Image-processing-algorithm Public
paper implement
-
deepseekv2-profile Public
Forked from madsys-dev/deepseekv2-profileJupyter Notebook UpdatedMay 31, 2024 -
-
accelerate Public
Forked from huggingface/accelerate🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
-
nndeploy Public
Forked from nndeploy/nndeploynndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为内核,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
-
kineto Public
Forked from pytorch/kinetoA CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
HTML Other UpdatedApr 15, 2024