- Beijing, China
-
10:48
(UTC +08:00)
Stars
how to optimize some algorithm in cuda.
🔮 ChatGPT Desktop Application (Mac, Windows and Linux)
A list of awesome compiler projects and papers for tensor computation and deep learning.
Elixir: Train a Large Language Model on a Small GPU Cluster
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🔥Highlighting the top ML papers every week.
Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍
Development repository for the Triton language and compiler
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
A framework for managing and maintaining multi-language pre-commit hooks.
IdeaVim – A Vim engine for JetBrains IDEs
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
Latex Template for Undergraduate Thesis at School of EECS, Peking University
Scalable PaLM implementation of PyTorch
Examples of training models with hybrid parallelism using ColossalAI
Performance benchmarking with ColossalAI
Sky Computing: Accelerating Geo-distributed Computing in Federated Learning
Optimized primitives for collective multi-GPU communication
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Making large AI models cheaper, faster and more accessible
Ongoing research training transformer models at scale