Skip to content
View yikangshen's full-sized avatar

Block or report yikangshen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 22 2 Updated Mar 7, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,932 286 Updated May 15, 2025

An AI Hedge Fund Team

Python 42,479 7,537 Updated Nov 13, 2025

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

Python 3,250 313 Updated Jul 7, 2025

LM engine is a library for pretraining/finetuning LLMs

Python 77 22 Updated Nov 28, 2025

[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Python 930 46 Updated Nov 16, 2025

A PyTorch native platform for training generative AI models

Python 4,769 615 Updated Nov 25, 2025

Efficient, check-pointed data loading for deep learning with massive data sets.

Python 210 17 Updated Jun 12, 2023

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

1,241 88 Updated Jun 25, 2025

A framework for few-shot evaluation of autoregressive language models.

Python 24 9 Updated Dec 21, 2023

Reaching LLaMA2 Performance with 0.1M Dollars

Python 987 78 Updated Jul 23, 2024

Here we will test various linear attention designs.

Python 62 14 Updated Apr 25, 2024

Triton-based implementation of Sparse Mixture of Experts.

Python 253 22 Updated Oct 3, 2025

πŸš€ Efficient implementations of state-of-the-art linear attention models

Python 3,936 315 Updated Nov 27, 2025

MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks

Jupyter Notebook 8,438 522 Updated Oct 8, 2025

https://acl2023-retrieval-lm.github.io/

JavaScript 158 15 Updated Oct 18, 2023

πŸ€— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 2 Updated Mar 28, 2023

speech self-supervised representations

Python 514 39 Updated Apr 27, 2023

Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4

C 945 106 Updated Nov 10, 2025

Awesome Lists for Tenure-Track Assistant Professors and PhD students. (εŠ©η†ζ•™ζŽˆ/εšε£«η”Ÿη”Ÿε­˜ζŒ‡ε—)

Python 1,612 92 Updated Feb 1, 2024

Solve puzzles. Learn CUDA.

Jupyter Notebook 11,765 905 Updated Sep 1, 2024

Understanding the Difficulty of Training Transformers

Python 332 19 Updated May 31, 2022

A collection of AWESOME things about mixture-of-experts

1,231 81 Updated Dec 8, 2024

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Python 8,285 963 Updated Feb 25, 2022

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

Python 1,206 110 Updated Apr 19, 2024

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

Python 834 66 Updated Sep 13, 2023
Next