Skip to content
View ZillaRU's full-sized avatar
🎼
🎼

Block or report ZillaRU

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

REverse-Engineered Reasoning for Open-Ended Generation

Python 83 6 Updated Sep 10, 2025
Python 64 6 Updated Nov 22, 2025

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA (+ more DSLs)

Python 683 93 Updated Nov 21, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,733 271 Updated Nov 28, 2025

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 3,734 696 Updated Nov 28, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,851 4,651 Updated Nov 26, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,654 850 Updated Nov 28, 2025

Spark-TTS Inference Code

Python 10,747 1,146 Updated Apr 9, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 460 68 Updated Oct 24, 2025

the resources about the application based on LLM with RAG pattern

1,585 102 Updated Nov 4, 2025

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 632 51 Updated Apr 8, 2025

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 15,956 1,875 Updated Nov 7, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 48,781 4,015 Updated Nov 28, 2025

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Python 847 88 Updated Sep 16, 2025

Official code repo for the O'Reilly Book - "Hands-On Large Language Models"

Jupyter Notebook 18,105 4,272 Updated Jul 21, 2025

how to learn PyTorch and OneFlow

460 28 Updated Mar 22, 2024

compiler learning resources collect.

Python 2,597 360 Updated Mar 19, 2025

Puzzles for learning Triton

Jupyter Notebook 2,140 175 Updated Nov 18, 2024

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,759 284 Updated Nov 28, 2025

PyTorch native quantization and sparsity for training and inference

Python 2,540 376 Updated Nov 27, 2025
Jupyter Notebook 150 13 Updated Jul 4, 2025

Development repository for the Triton language and compiler

MLIR 17,697 2,412 Updated Nov 27, 2025

A high performance and generic framework for distributed DNN training

Python 3,713 494 Updated Oct 3, 2023

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 5,764 667 Updated Jun 4, 2025

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 605 72 Updated Oct 14, 2025

Ring attention implementation with flash attention

Python 923 88 Updated Sep 10, 2025

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,437 291 Updated Nov 28, 2025

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 5,214 726 Updated Nov 21, 2025

💯 2025年系统架构设计师(软考高级)备考资料。

HTML 8,105 2,029 Updated Nov 5, 2025
Python 1 Updated Apr 22, 2025
Next