-
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ Other UpdatedAug 7, 2025 -
-
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
Python BSD 3-Clause "New" or "Revised" License UpdatedAug 5, 2025 -
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedAug 1, 2025 -
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedJun 24, 2025 -
flux Public
Forked from bytedance/fluxA fast communication-overlapping library for tensor/expert parallelism on GPUs.
C++ Apache License 2.0 UpdatedJun 18, 2025 -
nccl Public
Forked from NVIDIA/ncclOptimized primitives for collective multi-GPU communication
C++ Other UpdatedApr 28, 2025 -
DeepEP Public
Forked from deepseek-ai/DeepEPDeepEP: an efficient expert-parallel communication library
Cuda MIT License UpdatedFeb 25, 2025 -
BitBLAS Public
Forked from microsoft/BitBLASBitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Python MIT License UpdatedJan 11, 2025 -
ZhiLight Public
Forked from zhihu/ZhiLightA highly optimized inference acceleration engine for Llama and its variants.
C++ Apache License 2.0 UpdatedJan 9, 2025 -
cuda-samples Public
Forked from NVIDIA/cuda-samplesSamples for CUDA Developers which demonstrates features in CUDA Toolkit
C Other UpdatedDec 15, 2024 -
TransformerEngine Public
Forked from NVIDIA/TransformerEngineA library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Python Apache License 2.0 UpdatedSep 20, 2024 -
marlin Public
Forked from IST-DASLab/marlinFP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Python Apache License 2.0 UpdatedSep 4, 2024 -
cuda_examples Public
个人项目, 维护cuda相关的example.
Cuda GNU General Public License v3.0 UpdatedDec 4, 2022 -
YHs_Sample Public
Forked from Yinghan-Li/YHs_Samplecuda code benchmark
Cuda GNU General Public License v3.0 UpdatedJul 25, 2022 -
CS-Books Public
Forked from huihut/CS-Books📚 Computer Science Books 计算机技术类书籍 PDF
UpdatedDec 26, 2019 -
kaldi-io-for-python Public
Forked from KarelVesely84/kaldi-io-for-pythonPython functions for reading kaldi data formats. Useful for rapid prototyping with python.
Python Apache License 2.0 UpdatedSep 19, 2019 -
ChineseNER Public
Forked from buppt/ChineseNER中文命名实体识别,实体抽取,tensorflow,pytorch,BiLSTM+CRF
Python UpdatedDec 3, 2018 -
keras-kaldi Public
Forked from dspavankumar/keras-kaldiKeras Interface for Kaldi ASR
Python GNU General Public License v3.0 UpdatedSep 27, 2017