Stars
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
A list of awesome compiler projects and papers for tensor computation and deep learning.
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Distributed Compiler based on Triton for Parallel Systems
Community maintained hardware plugin for vLLM on Ascend
FlagGems is an operator library for large language models implemented in the Triton Language.
how to optimize some algorithm in cuda.
A Datacenter Scale Distributed Inference Serving Framework
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Userspace eBPF runtime for Observability, Network, GPU & General Extensions Framework
🚀 Efficient implementations of state-of-the-art linear attention models
My learning notes/codes for ML SYS.
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
Simply change your app's icon on macOS. Just a click.
go-musicfox是用Go写的又一款网易云音乐命令行客户端,支持UnblockNeteaseMusic、各种音质级别、lastfm、MPRIS、MacOS交互响应(睡眠暂停、蓝牙耳机连接断开响应、菜单栏控制等)...
🔍 Quick file search & app launcher for Windows with community-made plugins
Just like TextEdit on Mac but dedicated to Markdown.
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
2021年最新整理, C++ 学习资料,含C++ 11 / 14 / 17 / 20 / 23 新特性、入门教程、推荐书籍、优质文章、学习笔记、教学视频等
Ongoing research training transformer models at scale
Making large AI models cheaper, faster and more accessible
A markup-based typesetting system that is powerful and easy to learn.
An easy way to uninstall Microsoft AutoUpdate on macOS.
A privacy-first, open-source platform for knowledge management and collaboration. Download link: http://github.com/logseq/logseq/releases. roadmap: http://trello.com/b/8txSM12G/roadmap
Lightning-fast and Powerful Code Editor written in Rust
This is the unofficial LaTeX class for Master/Ph.D. Thesis Template of Huazhong University of Science and Technology