Stars
An End-to-End Infrastructure for Training and Evaluating Various LLM Agents
BurstEngine is an efficient framework designed to train LLMs on long-sequence data.
Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / verl / LLaMA Factory / ms-swift / U…
Source codes for paper "BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity".
A low-latency, billion-scale, and updatable graph-based vector store on SSD.
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
This repository contains a simple web application that allows users to generate QR codes for URLs.
thunlp / Seq1F1B
Forked from NVIDIA/Megatron-LMSequence-level 1F1B schedule for LLMs.
[ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.
Fourier Controller Networks (FCNet) for Real-Time Decision-Making in Embodied Learning, ICML 2024
Tile primitives for speedy kernels
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
Achazwl / mlc
Forked from mlc-ai/mlc-llmMiniCPM on Android platform.
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks