joydddd

Joy Juechu Dong joydddd

PhD cadi@umich | GPU Kernel Opt/Confidential Computing

82 followers · 23 following

Achievements

Highlights

Organizations

Lists (5)

Sort

EECS388

EECS570

Stars

River707 / stack-mr

Forked from modular/stack-pr

A tool for working with stacked MRs on gitlab.

Python 3 Updated Dec 4, 2025

NTT123 / cute-viz

Cute layout visualization

Python 25 4 Updated Nov 21, 2025

wuye9036 / CppTemplateTutorial

中文的C++ Template的教学指南。与知名书籍C++ Templates不同，该系列教程将C++ Templates作为一门图灵完备的语言来讲授，以求帮助读者对Meta-Programming融会贯通。(正在施工中)

C++ 10,496 1,621 Updated Aug 20, 2024

meta-pytorch / kraken

Triton-based Symmetric Memory operators and examples

Python 75 11 Updated Jan 15, 2026

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,223 86 Updated Aug 28, 2025

yifuwang / symm-mem-recipes

Python 155 14 Updated Dec 27, 2024

facebookexperimental / triton

Github mirror of trition-lang/triton repo.

MLIR 123 33 Updated Jan 15, 2026

pytorch / helion

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 711 94 Updated Jan 13, 2026

DrUsagi / Colorful-Bionic

A Bionic Reading Extension for Zotero with Verbs and Nouns Highlight

TypeScript 122 Updated Apr 11, 2025

windingwind / bionic-for-zotero

Bionic reading experience with Zotero.

TypeScript 328 6 Updated Aug 7, 2025

microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 1,005 165 Updated Sep 19, 2024

NVIDIA / cccl

CUDA Core Compute Libraries

C++ 2,124 320 Updated Jan 15, 2026

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,886 1,060 Updated Dec 29, 2025

HigherOrderCO / Bend

A massively parallel, high-level programming language

Rust 19,132 469 Updated Jun 3, 2025

SlugLab / CXLMemSim

CXLMemSim: A pure software simulated CXL.mem for performance characterization

C 323 43 Updated Jan 8, 2026

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,907 334 Updated Nov 28, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 4,710 397 Updated Jan 15, 2026

NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

C 16,619 1,562 Updated Jan 13, 2026

pranjalssh / fast.cu

Fastest kernels written from scratch

Cuda 523 61 Updated Sep 18, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,637 2,010 Updated Jan 15, 2026

seq-lang / seq

A high-performance, Pythonic language for bioinformatics

C++ 706 48 Updated Dec 8, 2022

microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

C 455 35 Updated May 30, 2025

filebench / filebench

File system and storage benchmark that uses a custom language to generate a large variety of workloads.

C 376 134 Updated Aug 18, 2024

cmu-db / benchbase

Multi-DBMS SQL Benchmarking Framework via JDBC

Java 611 212 Updated Dec 13, 2025

NVIDIA / nvtrust

Ancillary open source software to support confidential computing on NVIDIA GPUs

Python 293 51 Updated Dec 23, 2025

meta-pytorch / attention-gym

Helpful tools and examples for working with flex-attention

Python 1,109 70 Updated Jan 14, 2026

ezyang / ghstack

Submit stacked diffs to GitHub on the command line

Python 902 78 Updated Dec 30, 2025

superctj / observatory-library

Python library for embedding inference of relational tables.

Python 3 Updated Jul 8, 2024

joydddd / hyrise

Forked from hyrise/hyrise

Hyrise is a research in-memory database.

C++ 1 Updated Jun 11, 2024

joydddd / llama2.c

C 1 Updated Apr 10, 2024