-
-
flashinfer-dev Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedOct 30, 2025 -
-
Megatron-LM Public
Forked from NVIDIA/Megatron-LMOngoing research training transformer models at scale
Python Other UpdatedApr 1, 2025 -
flux Public
Forked from bytedance/fluxA fast communication-overlapping library for tensor/expert parallelism on GPUs.
C++ Apache License 2.0 UpdatedMar 14, 2025 -
cmu-catalyst.github.io Public
Forked from cmu-catalyst/cmu-catalyst.github.ioHTML Other UpdatedJan 25, 2025 -
-
xgrammar Public
Forked from mlc-ai/xgrammarEfficient, Flexible and Portable Structured Generation
C++ Apache License 2.0 UpdatedNov 27, 2024 -
-
open-gpu-kernel-modules Public
Forked from NVIDIA/open-gpu-kernel-modulesNVIDIA Linux open GPU kernel module source
C Other UpdatedSep 14, 2024 -
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedSep 12, 2024 -
triton Public
Forked from triton-lang/tritonDevelopment repository for the Triton language and compiler
C++ MIT License UpdatedAug 22, 2024 -
-
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ Other UpdatedJul 24, 2024 -
mirage Public
Forked from mirage-project/mirageA multi-level tensor algebra superoptimizer
-
texmacs Public
Forked from texmacs/texmacsSource Code of GNU TeXmacs, Developers Guide ==>
Tcl GNU General Public License v3.0 UpdatedApr 24, 2024 -
mlx Public
Forked from ml-explore/mlxMLX: An array framework for Apple silicon
C++ MIT License UpdatedFeb 19, 2024 -
pbrt-v4 Public
Forked from mmp/pbrt-v4Source code to pbrt, the ray tracer described in the forthcoming 4th edition of the "Physically Based Rendering: From Theory to Implementation" book.
C++ Apache License 2.0 UpdatedFeb 17, 2024 -
metal-benchmarks Public
Forked from philipturner/metal-benchmarksApple GPU microarchitecture
Metal MIT License UpdatedJan 31, 2024 -
nccl Public
Forked from NVIDIA/ncclOptimized primitives for collective multi-GPU communication
C++ Other UpdatedJan 9, 2024 -
flashinfer-ai.github.io Public
Forked from flashinfer-ai/flashinfer-ai.github.ioProject website of FlashInfer project
HTML UpdatedJan 6, 2024 -
tvm Public
Forked from apache/tvmOpen deep learning compiler stack for cpu, gpu and specialized accelerators
Python Apache License 2.0 UpdatedJan 2, 2024 -
punica Public
Forked from punica-ai/punicaServing multiple LoRA finetuned LLM as one
-
mlc-llm Public
Forked from mlc-ai/mlc-llmEnable everyone to develop, optimize and deploy AI models natively on everyone's devices.
-
uwsampl.github.io Public
Forked from uwsampl/uwsampl.github.ioThe UW SAMPL group's website.
HTML Other UpdatedSep 5, 2023 -
-
-
-
relax-sparse Public
Forked from tlc-pack/relaxTemp repo for prototyping relax(relay next), the effort will be upstreamed. We use the wiki pages on this repo to host design docs.
Python Apache License 2.0 UpdatedJun 10, 2023 -