Stars
- All languages
- Assembly
- C
- C#
- C++
- CSS
- Cuda
- Dockerfile
- Emacs Lisp
- Go
- Groovy
- HCL
- HTML
- Haskell
- Java
- JavaScript
- Jsonnet
- Jupyter Notebook
- LLVM
- Lua
- MLIR
- Makefile
- Mustache
- Objective-C
- Open Policy Agent
- P4
- PHP
- PLpgSQL
- Perl
- PowerShell
- Python
- Roff
- Ruby
- Rust
- Scheme
- Shell
- Swift
- SystemVerilog
- TLA
- TypeScript
- VHDL
- Verilog
- Vue
- Zig
cluster data collected from production clusters in Alibaba for cluster management research
DeepEP: an efficient expert-parallel communication library
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
2021/3/30 ~ 2021/7/12 に行われる企画「競プロ典型 90 問」の問題・解説・ソースコードなどの資料をアップロードしています。
Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Tongyi Deep Research, the Leading Open-source Deep Research Agent
A simple Python sandbox for helpful LLM data agents
SWE-bench: Can Language Models Resolve Real-world Github Issues?
Scalable toolkit for efficient model reinforcement
Achieve state of the art inference performance with modern accelerators on Kubernetes
Large Language Model Text Generation Inference
Robust recipes to align language models with human and AI preferences
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718