Skip to content
View 1SAA's full-sized avatar
🤓
Coding
🤓
Coding

Block or report 1SAA

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

how to optimize some algorithm in cuda.

Cuda 2,647 239 Updated Nov 27, 2025

🔮 ChatGPT Desktop Application (Mac, Windows and Linux)

Rust 54,338 6,202 Updated Aug 29, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,686 322 Updated Oct 19, 2024

Explanation to key concepts in ML

8,125 663 Updated Jun 30, 2025

Elixir: Train a Large Language Model on a Small GPU Cluster

Python 15 5 Updated Jun 8, 2023

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,853 4,652 Updated Nov 26, 2025

🔥Highlighting the top ML papers every week.

12,112 750 Updated Jul 20, 2025

Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍

Shell 23,847 3,523 Updated Nov 13, 2025

Development repository for the Triton language and compiler

MLIR 17,698 2,413 Updated Nov 27, 2025

A collection of models built with ColossalAI

Python 32 16 Updated Nov 22, 2022

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda 911 333 Updated Aug 19, 2024

A framework for managing and maintaining multi-language pre-commit hooks.

Python 14,631 910 Updated Nov 25, 2025

IdeaVim – A Vim engine for JetBrains IDEs

Kotlin 10,032 800 Updated Nov 28, 2025

A correctness test for ViT in Cifar10.

Python 1 Updated Sep 9, 2022

Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

Python 30,534 3,623 Updated Nov 28, 2025

Latex Template for Undergraduate Thesis at School of EECS, Peking University

TeX 41 9 Updated Jun 3, 2022

Scalable PaLM implementation of PyTorch

Python 189 27 Updated Dec 19, 2022
Python 1,616 144 Updated Apr 27, 2023

Examples of training models with hybrid parallelism using ColossalAI

Python 339 102 Updated Mar 23, 2023

Performance benchmarking with ColossalAI

Python 38 16 Updated Jul 6, 2022

Sky Computing: Accelerating Geo-distributed Computing in Federated Learning

Python 91 21 Updated Nov 22, 2022

计算机自学指南

HTML 69,582 7,744 Updated Nov 28, 2025

Optimized primitives for collective multi-GPU communication

C++ 4,274 1,078 Updated Nov 10, 2025

NCCL Tests

Cuda 1,344 334 Updated Nov 21, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 95,444 26,038 Updated Nov 29, 2025

Making large AI models cheaper, faster and more accessible

Python 41,274 4,541 Updated Nov 24, 2025

Ongoing research training transformer models at scale

Python 14,349 3,322 Updated Nov 28, 2025