Skip to content
View joydddd's full-sized avatar

Highlights

  • Pro

Organizations

@AberAberAber @Minimap2onGPU @EECS471-GPU-Progamming

Block or report joydddd

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A tool for working with stacked MRs on gitlab.

Python 3 Updated Dec 4, 2025

Cute layout visualization

Python 25 4 Updated Nov 21, 2025

中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)

C++ 10,496 1,621 Updated Aug 20, 2024

Triton-based Symmetric Memory operators and examples

Python 75 11 Updated Jan 15, 2026

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,223 86 Updated Aug 28, 2025
Python 155 14 Updated Dec 27, 2024

Github mirror of trition-lang/triton repo.

MLIR 123 33 Updated Jan 15, 2026

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 711 94 Updated Jan 13, 2026

A Bionic Reading Extension for Zotero with Verbs and Nouns Highlight

TypeScript 122 Updated Apr 11, 2025

Bionic reading experience with Zotero.

TypeScript 328 6 Updated Aug 7, 2025

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 1,005 165 Updated Sep 19, 2024

CUDA Core Compute Libraries

C++ 2,124 320 Updated Jan 15, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 8,886 1,060 Updated Dec 29, 2025

A massively parallel, high-level programming language

Rust 19,132 469 Updated Jun 3, 2025

CXLMemSim: A pure software simulated CXL.mem for performance characterization

C 323 43 Updated Jan 8, 2026

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,907 334 Updated Nov 28, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 4,710 397 Updated Jan 15, 2026

NVIDIA Linux open GPU kernel module source

C 16,619 1,562 Updated Jan 13, 2026

Fastest kernels written from scratch

Cuda 523 61 Updated Sep 18, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,637 2,010 Updated Jan 15, 2026

A high-performance, Pythonic language for bioinformatics

C++ 706 48 Updated Dec 8, 2022

Dynamic Memory Management for Serving LLMs without PagedAttention

C 455 35 Updated May 30, 2025

File system and storage benchmark that uses a custom language to generate a large variety of workloads.

C 376 134 Updated Aug 18, 2024

Multi-DBMS SQL Benchmarking Framework via JDBC

Java 611 212 Updated Dec 13, 2025

Ancillary open source software to support confidential computing on NVIDIA GPUs

Python 293 51 Updated Dec 23, 2025

Helpful tools and examples for working with flex-attention

Python 1,109 70 Updated Jan 14, 2026

Submit stacked diffs to GitHub on the command line

Python 902 78 Updated Dec 30, 2025

Python library for embedding inference of relational tables.

Python 3 Updated Jul 8, 2024

Hyrise is a research in-memory database.

C++ 1 Updated Jun 11, 2024
C 1 Updated Apr 10, 2024
Next