YeSho-cpp

YeSho YeSho-cpp

Student of Nanchang University

1 follower · 16 following

Nanchang University

Highlights

Stars

Syencil / Programming_Massively_Parallel_Processors

CUDA 6大并行计算模式代码与笔记

Cuda 61 9 Updated Jul 30, 2020

NVIDIA / nvshmem

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 346 27 Updated Oct 16, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,145 81 Updated Aug 28, 2025

FZJ-JSC / tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda 305 63 Updated Sep 2, 2025

ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 39,375 6,784 Updated Oct 18, 2025

kubernetes / node-problem-detector

This is a place for various problem detectors running on the Kubernetes nodes.

Go 3,256 677 Updated Oct 14, 2025

Infrasys-AI / AIInfra

AIInfra（AI 基础设施）指AI系统从底层芯片等硬件，到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 4,760 674 Updated Oct 18, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 18,865 1,851 Updated Oct 6, 2025

google / nccl-fastsocket

NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.

C++ 121 15 Updated Nov 15, 2023

antgroup / DeepXTrace

DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.

Python 62 3 Updated Oct 9, 2025

bytedance / ps-lite

Forked from dmlc/ps-lite

A lightweight parameter server interface

C++ 83 28 Updated Jan 13, 2023

stepfun-ai / StepMesh

C++ 307 26 Updated Oct 1, 2025

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 94,030 25,599 Updated Oct 18, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 7,107 910 Updated Aug 31, 2025

uxlfoundation / oneCCL

oneAPI Collective Communications Library (oneCCL)

C++ 245 89 Updated Sep 24, 2025

wdndev / llm_interview_note

主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题

HTML 10,392 1,066 Updated Apr 30, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,177 516 Updated Sep 23, 2025

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,296 321 Updated Oct 2, 2025

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 423 70 Updated Oct 18, 2025

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 13,871 3,162 Updated Oct 18, 2025

deepspeedai / DeepSpeedExamples

Example models using DeepSpeed

Python 6,695 1,109 Updated Oct 15, 2025

linux-rdma / rdma-core

RDMA core userspace libraries and daemons

C 1,987 788 Updated Sep 21, 2025

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

C++ 4,158 1,041 Updated Oct 18, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,616 958 Updated Oct 17, 2025

0voice / qt_interview_reference

2023年最新整理，qt开发最全面试集锦，含网络，文件系统，数据库，自定义控件，以及视频讲解，文档

391 90 Updated May 20, 2024

digitalpathologybern / hover_next_train

Training/Evaluation code for HoVer-NeXt

Python 14 6 Updated Jan 6, 2025

ylmbtm / GameProject3

游戏服务器框架，网络层分别用SocketAPI、Boost Asio、Libuv三种方式实现，框架内使用共享内存，无锁队列，对象池，内存池来提高服务器性能。还包含一个不断完善的Unity 3D客户端，客户端含大量完整资源，坐骑，宠物，伙伴，装备, 这些均己实现上阵和穿戴, 并可进入副本战斗，多人玩法也己实现, 持续开发中。

C++ 1,539 510 Updated Apr 28, 2025

guangzhengli / k8s-tutorials

k8s tutorials | k8s 教程

Go 5,545 613 Updated Aug 18, 2025

0voice / cpp_backend_awsome_blog

2023年最新整理 c++后端开发，1000篇优秀博文，含内存，网络，架构设计，高性能，数据结构，基础组件，中间件，分布式相关

1,961 420 Updated Mar 17, 2023

Admol / SystemDesign

系统设计面试：内幕指南（System Design Interview: An Insider’s Guide）

2,209 333 Updated Jul 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly