-
Nanjing University
- Shanghai, China
-
20:17
(UTC +08:00) - wrhuang.top
Lists (5)
Sort Name ascending (A-Z)
Stars
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …
A throughput-oriented high-performance serving framework for LLMs
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
High performance Transformer implementation in C++.
Disaggregated serving system for Large Language Models (LLMs).
Sample codes for my CUDA programming book
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Development repository for the Triton language and compiler
Kimi K2 is the large language model series developed by Moonshot AI team
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
WaferLLM: Large Language Model Inference at Wafer Scale
An intuitive and low-overhead instrumentation tool for Python
The AWK Programming Language (AWK 程序设计语言, awkbook) 中文翻译, LaTeX 排版
A local-first, cross-platform note-taking app leveraging the Typst ecosystem. Designed to minimize distractions and enhance the retention of information.
A next-generation C++ language server for modern C++, focused on high performance and deep code intelligence
[ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.
Optimized primitives for collective multi-GPU communication
Fast and memory-efficient exact attention
Unified KV Cache Compression Methods for Auto-Regressive Models
A neovim plugin for rendering typst inline using the kitty unicode graphics protocol
[ICLR 2025 Oral] PyTorch code for the paper "Open-World Reinforcement Learning over Long Short-Term Imagination"