-
Megatron-LM Public
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
Python Other UpdatedDec 18, 2025 -
TransformerEngine Public
Forked from NVIDIA/TransformerEngineA library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Python Apache License 2.0 UpdatedNov 26, 2025 -
pytorch Public
Forked from pytorch/pytorchTensors and Dynamic neural networks in Python with strong GPU acceleration
C++ Other UpdatedOct 9, 2025 -
NeMo Public
Forked from NVIDIA-NeMo/NeMoNeMo: a toolkit for conversational AI
Python Apache License 2.0 UpdatedSep 2, 2025 -
-
grouped_gemm Public
Forked from fanshiqing/grouped_gemmPyTorch bindings for CUTLASS grouped GEMM.
Cuda Apache License 2.0 UpdatedJun 4, 2024 -
-
megabyte Public
A PyTorch implementation of MEGABYTE. This multi-scale transformer architecture has the excellent features of tokenization-free and sub-quadratic attention. The paper link: https://arxiv.org/abs/23…
-
shu Public
中文书籍收录整理, Collection of Chinese Books
-
-
do-we-need-attention Public
Forked from srush/do-we-need-attentionTeX MIT License UpdatedOct 18, 2023 -
-
safari Public
Forked from HazyResearch/safariConvolutions for Sequence Modeling
Assembly Apache License 2.0 UpdatedOct 16, 2023 -
blueprint-trainer Public
Scaffolding for sequence model training research.
-
apex Public
Forked from NVIDIA/apexA PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Python BSD 3-Clause "New" or "Revised" License UpdatedAug 24, 2023 -
c4-dataset-script Public
Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.
-
MEGABYTE-pytorch Public
Forked from lucidrains/MEGABYTE-pytorchImplementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
Python MIT License UpdatedMay 31, 2023 -
tinygrad Public
Forked from tinygrad/tinygradYou like pytorch? You like micrograd? You love tinygrad! ❤️
Python MIT License UpdatedMay 29, 2023 -
hyena-jax Public
Forked from irhum/hyenaJAX/Flax implementation of the Hyena Hierarchy
Jupyter Notebook MIT License UpdatedApr 27, 2023 -
BLOOM-COT Public
Forked from bigscience-workshop/Megatron-DeepSpeedOngoing research training transformer language models at scale, including: BERT & GPT-2
Python Other UpdatedApr 19, 2023 -
gpt-neox Public
Forked from EleutherAI/gpt-neoxAn implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
Python Apache License 2.0 UpdatedApr 14, 2023 -
RWKV-LM Public
Forked from BlinkDL/RWKV-LMRWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, …
Python Apache License 2.0 UpdatedFeb 7, 2023 -
bagua-core Public
Forked from BaguaSys/bagua-coreCore communication lib for Bagua.
Rust MIT License UpdatedDec 19, 2022 -
-
-
GLM-130B Public
Forked from zai-org/GLM-130BGLM-130B: An Open Bilingual Pre-Trained Model
Python Apache License 2.0 UpdatedOct 11, 2022 -
transformers Public
Forked from huggingface/transformers🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Python Apache License 2.0 UpdatedOct 3, 2022 -
OptimalShardedDataParallel Public
Forked from Youhe-Jiang/IJCAI2023-OptimalShardedDataParallelAn automated parallel training system that combines the advantages from both data and model parallelism. If you have any interests, please visit/star/fork https://github.com/Youhe-Jiang/OptimalShar…
Python UpdatedSep 28, 2022 -
juicefs Public
Forked from juicedata/juicefsJuiceFS is a distributed POSIX file system built on top of Redis and S3.
Go Apache License 2.0 UpdatedSep 6, 2022 -
TimeChamber Public
Forked from inspirai/TimeChamberA Massively Parallel Large Scale Self-Play Framework
Python MIT License UpdatedSep 6, 2022