Skip to content
View YeSho-cpp's full-sized avatar
  • Nanchang University

Highlights

  • Pro

Block or report YeSho-cpp

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CUDA 6大并行计算模式 代码与笔记

Cuda 61 9 Updated Jul 30, 2020

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 346 27 Updated Oct 16, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,145 81 Updated Aug 28, 2025

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda 305 63 Updated Sep 2, 2025

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 39,375 6,784 Updated Oct 18, 2025

This is a place for various problem detectors running on the Kubernetes nodes.

Go 3,256 677 Updated Oct 14, 2025

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 4,760 674 Updated Oct 18, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 18,865 1,851 Updated Oct 6, 2025

NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.

C++ 121 15 Updated Nov 15, 2023

DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.

Python 62 3 Updated Oct 9, 2025

A lightweight parameter server interface

C++ 83 28 Updated Jan 13, 2023
C++ 307 26 Updated Oct 1, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 94,030 25,599 Updated Oct 18, 2025

Nano vLLM

Python 7,107 910 Updated Aug 31, 2025

oneAPI Collective Communications Library (oneCCL)

C++ 245 89 Updated Sep 24, 2025

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 10,392 1,066 Updated Apr 30, 2025

Material for gpu-mode lectures

Jupyter Notebook 5,177 516 Updated Sep 23, 2025

NCCL Tests

Cuda 1,296 321 Updated Oct 2, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 423 70 Updated Oct 18, 2025

Ongoing research training transformer models at scale

Python 13,871 3,162 Updated Oct 18, 2025

Example models using DeepSpeed

Python 6,695 1,109 Updated Oct 15, 2025

RDMA core userspace libraries and daemons

C 1,987 788 Updated Sep 21, 2025

Optimized primitives for collective multi-GPU communication

C++ 4,158 1,041 Updated Oct 18, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,616 958 Updated Oct 17, 2025

2023年最新整理,qt开发最全面试集锦,含网络,文件系统,数据库,自定义控件,以及视频讲解,文档

391 90 Updated May 20, 2024

Training/Evaluation code for HoVer-NeXt

Python 14 6 Updated Jan 6, 2025

游戏服务器框架,网络层分别用SocketAPI、Boost Asio、Libuv三种方式实现, 框架内使用共享内存,无锁队列,对象池,内存池来提高服务器性能。还包含一个不断完善的Unity 3D客户端,客户端含大量完整资源,坐骑,宠物,伙伴,装备, 这些均己实现上阵和穿戴, 并可进入副本战斗,多人玩法也己实现, 持续开发中。

C++ 1,539 510 Updated Apr 28, 2025

k8s tutorials | k8s 教程

Go 5,545 613 Updated Aug 18, 2025

2023年最新整理 c++后端开发,1000篇优秀博文,含内存,网络,架构设计,高性能,数据结构,基础组件,中间件,分布式相关

1,961 420 Updated Mar 17, 2023

系统设计面试:内幕指南(System Design Interview: An Insider’s Guide)

2,209 333 Updated Jul 31, 2025
Next