Skip to content
View GongChenyu's full-sized avatar

Highlights

  • Pro

Block or report GongChenyu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 1 Updated Jan 28, 2024

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

Python 30 1 Updated Mar 4, 2025

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++ 535 49 Updated Sep 13, 2025

用C++实现一个简单的Transformer模型。 Attention Is All You Need。

C++ 52 9 Updated Mar 11, 2021

yecao100的公告栏

JavaScript 188 11 Updated Jul 28, 2022

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,256 1,896 Updated Nov 28, 2025

A large-scale simulation framework for LLM inference

Python 486 91 Updated Jul 25, 2025

LLMPerf is a library for validating and benchmarking LLMs

Python 1,051 197 Updated Dec 9, 2024

Distributed LLM and StableDiffusion inference for mobile, desktop and server.

Rust 2,893 171 Updated Oct 23, 2024

Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and other interesting stuffs).

137 6 Updated Nov 4, 2025

Open5GS 5GC & UERANSIM UE / RAN Sample Configuration

77 20 Updated May 5, 2024

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,760 324 Updated Nov 28, 2025

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 584 71 Updated Sep 11, 2024

Model Context Protocol(MCP) 编程极速入门

3,134 187 Updated Apr 23, 2025

AG-UI: the Agent-User Interaction Protocol. Bring Agents into Frontend Applications.

TypeScript 10,027 925 Updated Nov 28, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,995 1,165 Updated Nov 28, 2025

LLM Inference benchmark

Python 431 40 Updated Jul 23, 2024

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 80,017 11,910 Updated Nov 25, 2025

An open protocol enabling communication and interoperability between opaque agentic applications.

Shell 20,838 2,127 Updated Nov 26, 2025

LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey | Awesome Human-Agent Collaboration | Human-AI Collaboration

170 6 Updated Nov 24, 2025

Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.

Python 3,103 272 Updated Nov 19, 2025

A lightweight, powerful framework for multi-agent workflows

Python 17,567 2,918 Updated Nov 27, 2025

A collection of MCP servers.

75,799 6,390 Updated Nov 28, 2025

DGL中文文档。This is the Chinese manual of the graph neural network library DGL, currently contains the User Guide.

77 14 Updated Feb 16, 2022

No fortress, purely open ground. OpenManus is Coming.

Python 51,037 8,904 Updated Nov 17, 2025
Python 16 Updated Dec 17, 2023
OpenEdge ABL 3 3 Updated Oct 23, 2023

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,334 446 Updated Nov 28, 2025

LLM serving cluster simulator

Jupyter Notebook 122 12 Updated Apr 25, 2024
Next