MARD1NO

🎯

Focusing

ZZK MARD1NO

🎯

Focusing

Paddle very good

383 followers · 424 following

SiliconFlow
Neverland
https://mard1no.github.io/

Achievements

x2 x3

Achievements

x2 x3

DLSlime Public
Forked from DeepLink-org/DLSlime

DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit

C++ BSD 3-Clause "New" or "Revised" License Updated Sep 18, 2025
NVSHMEM-Tutorial Public
Forked from KuangjuX/NVSHMEM-Tutorial

Cuda Updated Sep 16, 2025
tvm-ffi Public
Forked from apache/tvm-ffi

TVM FFI

C++ Apache License 2.0 Updated Sep 15, 2025
checkpoint-engine Public
Forked from MoonshotAI/checkpoint-engine

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python MIT License Updated Sep 10, 2025
batch_invariant_ops Public
Forked from thinking-machines-lab/batch_invariant_ops

Python MIT License Updated Sep 10, 2025
NVSHMEM Public
Forked from NVIDIA/nvshmem

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ Other Updated Sep 6, 2025
uccl Public
Forked from uccl-project/uccl

Ultra and Unified CCL

C++ Apache License 2.0 Updated Aug 15, 2025
VeOmni Public
Forked from ByteDance-Seed/VeOmni

VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework

Python Apache License 2.0 Updated Aug 12, 2025
gpt-oss Public
Forked from openai/gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python Apache License 2.0 Updated Aug 6, 2025
kraken Public
Forked from meta-pytorch/kraken

Triton-based Symmetric Memory operators and examples

Python Other Updated Jul 31, 2025
StepMesh Public
Forked from stepfun-ai/StepMesh

C++ Apache License 2.0 Updated Jul 29, 2025
cutedsl-utilities Public
Forked from HanGuo97/cutedsl-utilities

Python Updated Jul 22, 2025
tilelang Public
Forked from tile-ai/tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ MIT License Updated Jul 17, 2025
FastDeploy Public
Forked from PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

C++ 1 Apache License 2.0 Updated Jun 30, 2025
maple-font Public
Forked from subframe7536/maple-font

Maple Mono: Open source monospace font with round corner, ligatures and Nerd-Font for IDE and terminal, fine-grained customization options. 带连字和控制台图标的圆角等宽字体，中英文宽度完美2:1，细粒度的自定义选项

Python SIL Open Font License 1.1 Updated Jun 17, 2025
nano-vllm Public
Forked from GeeeekExplorer/nano-vllm

Nano vLLM

Python MIT License Updated Jun 15, 2025
tritonparse Public
Forked from meta-pytorch/tritonparse

TritonParse is a tool designed to help developers analyze and debug Triton kernels by visualizing the compilation process and source code mappings.

TypeScript BSD 3-Clause "New" or "Revised" License Updated Jun 14, 2025
CPM.cu Public
Forked from OpenBMB/CPM.cu

CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…

Cuda Apache License 2.0 Updated Jun 12, 2025
BitDecoding Public
Forked from DD-DuDa/BitDecoding

A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.

C++ MIT License Updated Jun 10, 2025
tiny-llm Public
Forked from skyzh/tiny-llm

A course of LLM inference serving on Apple Silicon for systems engineers.

Python Apache License 2.0 Updated Jun 7, 2025
tokasaurus Public
Forked from ScalingIntelligence/tokasaurus

Python Apache License 2.0 Updated Jun 5, 2025
Megakernels Public
Forked from HazyResearch/Megakernels

kernels, of the mega variety

Python MIT License Updated May 27, 2025
ib-traffic-monitor Public
Forked from NVIDIA/ib-traffic-monitor

A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node

C Apache License 2.0 Updated May 21, 2025
llm-d Public
Forked from llm-d/llm-d

llm-d is a Kubernetes-native high-performance distributed LLM inference framework

Makefile Apache License 2.0 Updated May 21, 2025
NVIDIA-Hopper-Benchmark Public
Forked from HPMLL/NVIDIA-Hopper-Benchmark

C++ GNU General Public License v3.0 Updated May 16, 2025
cuda-side-boost Public
Forked from ademeure/cuda-side-boost

Cuda MIT License Updated May 5, 2025
helion Public
Forked from pytorch/helion

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python BSD 3-Clause "New" or "Revised" License Updated May 5, 2025
FlashOverlap Public
Forked from infinigence/FlashOverlap

A lightweight design for computation-communication overlap.

Cuda Apache License 2.0 Updated Apr 29, 2025
DeepEP_ibrc_dual-ports_multiQP Public
Forked from Infrawaves/DeepEP_ibrc_dual-ports_multiQP

Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport

Cuda Updated Apr 27, 2025
cuda-demo Public

Cuda 3 Updated Apr 25, 2025

ZZK MARD1NO

Achievements

Achievements

DLSlime Public

Uh oh!

NVSHMEM-Tutorial Public

Uh oh!

tvm-ffi Public

Uh oh!

checkpoint-engine Public

Uh oh!

batch_invariant_ops Public

Uh oh!

NVSHMEM Public

Uh oh!

uccl Public

Uh oh!

VeOmni Public

Uh oh!

gpt-oss Public

Uh oh!

kraken Public

Uh oh!

StepMesh Public

Uh oh!

cutedsl-utilities Public

Uh oh!

tilelang Public

Uh oh!

FastDeploy Public

Uh oh!

maple-font Public

Uh oh!

nano-vllm Public

Uh oh!

tritonparse Public

Uh oh!

CPM.cu Public

Uh oh!

BitDecoding Public

Uh oh!

tiny-llm Public

Uh oh!

tokasaurus Public

Uh oh!

Megakernels Public

Uh oh!

ib-traffic-monitor Public

Uh oh!

llm-d Public

Uh oh!

NVIDIA-Hopper-Benchmark Public

Uh oh!

cuda-side-boost Public

Uh oh!

helion Public

Uh oh!

FlashOverlap Public

Uh oh!

DeepEP_ibrc_dual-ports_multiQP Public

Uh oh!

cuda-demo Public

Uh oh!