-
DLSlime Public
Forked from DeepLink-org/DLSlimeDLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
C++ BSD 3-Clause "New" or "Revised" License UpdatedSep 18, 2025 -
-
-
checkpoint-engine Public
Forked from MoonshotAI/checkpoint-engineCheckpoint-engine is a simple middleware to update model weights in LLM inference engines
Python MIT License UpdatedSep 10, 2025 -
batch_invariant_ops Public
Forked from thinking-machines-lab/batch_invariant_opsPython MIT License UpdatedSep 10, 2025 -
NVSHMEM Public
Forked from NVIDIA/nvshmemNVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
C++ Other UpdatedSep 6, 2025 -
uccl Public
Forked from uccl-project/ucclUltra and Unified CCL
C++ Apache License 2.0 UpdatedAug 15, 2025 -
VeOmni Public
Forked from ByteDance-Seed/VeOmniVeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework
Python Apache License 2.0 UpdatedAug 12, 2025 -
gpt-oss Public
Forked from openai/gpt-ossgpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Python Apache License 2.0 UpdatedAug 6, 2025 -
kraken Public
Forked from meta-pytorch/krakenTriton-based Symmetric Memory operators and examples
Python Other UpdatedJul 31, 2025 -
-
-
tilelang Public
Forked from tile-ai/tilelangDomain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
C++ MIT License UpdatedJul 17, 2025 -
FastDeploy Public
Forked from PaddlePaddle/FastDeployHigh-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
-
maple-font Public
Forked from subframe7536/maple-fontMaple Mono: Open source monospace font with round corner, ligatures and Nerd-Font for IDE and terminal, fine-grained customization options. 带连字和控制台图标的圆角等宽字体,中英文宽度完美2:1,细粒度的自定义选项
Python SIL Open Font License 1.1 UpdatedJun 17, 2025 -
nano-vllm Public
Forked from GeeeekExplorer/nano-vllmNano vLLM
Python MIT License UpdatedJun 15, 2025 -
tritonparse Public
Forked from meta-pytorch/tritonparseTritonParse is a tool designed to help developers analyze and debug Triton kernels by visualizing the compilation process and source code mappings.
TypeScript BSD 3-Clause "New" or "Revised" License UpdatedJun 14, 2025 -
CPM.cu Public
Forked from OpenBMB/CPM.cuCPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…
Cuda Apache License 2.0 UpdatedJun 12, 2025 -
BitDecoding Public
Forked from DD-DuDa/BitDecodingA GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
C++ MIT License UpdatedJun 10, 2025 -
tiny-llm Public
Forked from skyzh/tiny-llmA course of LLM inference serving on Apple Silicon for systems engineers.
Python Apache License 2.0 UpdatedJun 7, 2025 -
tokasaurus Public
Forked from ScalingIntelligence/tokasaurusPython Apache License 2.0 UpdatedJun 5, 2025 -
Megakernels Public
Forked from HazyResearch/Megakernelskernels, of the mega variety
Python MIT License UpdatedMay 27, 2025 -
ib-traffic-monitor Public
Forked from NVIDIA/ib-traffic-monitorA TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node
C Apache License 2.0 UpdatedMay 21, 2025 -
llm-d Public
Forked from llm-d/llm-dllm-d is a Kubernetes-native high-performance distributed LLM inference framework
Makefile Apache License 2.0 UpdatedMay 21, 2025 -
NVIDIA-Hopper-Benchmark Public
Forked from HPMLL/NVIDIA-Hopper-BenchmarkC++ GNU General Public License v3.0 UpdatedMay 16, 2025 -
-
helion Public
Forked from pytorch/helionA Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Python BSD 3-Clause "New" or "Revised" License UpdatedMay 5, 2025 -
FlashOverlap Public
Forked from infinigence/FlashOverlapA lightweight design for computation-communication overlap.
Cuda Apache License 2.0 UpdatedApr 29, 2025 -
DeepEP_ibrc_dual-ports_multiQP Public
Forked from Infrawaves/DeepEP_ibrc_dual-ports_multiQPAims to implement dual-port and multi-qp solutions in deepEP ibrc transport
Cuda UpdatedApr 27, 2025 -