Stars
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
LM engine is a library for pretraining/finetuning LLMs
[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
A PyTorch native platform for training generative AI models
Efficient, check-pointed data loading for deep learning with massive data sets.
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
A framework for few-shot evaluation of autoregressive language models.
Reaching LLaMA2 Performance with 0.1M Dollars
Here we will test various linear attention designs.
Triton-based implementation of Sparse Mixture of Experts.
π Efficient implementations of state-of-the-art linear attention models
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
https://acl2023-retrieval-lm.github.io/
ZheyuAqaZhang / transformers
Forked from huggingface/transformersπ€ Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
speech self-supervised representations
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
Awesome Lists for Tenure-Track Assistant Professors and PhD students. (ε©ηζζ/ε士ηηεζε)
Understanding the Difficulty of Training Transformers
A collection of AWESOME things about mixture-of-experts
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models