Highlights
- Pro
Stars
CUDA 6大并行计算模式 代码与笔记
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
This is a place for various problem detectors running on the Kubernetes nodes.
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
bytedance / ps-lite
Forked from dmlc/ps-liteA lightweight parameter server interface
Tensors and Dynamic neural networks in Python with strong GPU acceleration
oneAPI Collective Communications Library (oneCCL)
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
MSCCL++: A GPU-driven communication stack for scalable AI applications
Ongoing research training transformer models at scale
Example models using DeepSpeed
Optimized primitives for collective multi-GPU communication
DeepEP: an efficient expert-parallel communication library
2023年最新整理,qt开发最全面试集锦,含网络,文件系统,数据库,自定义控件,以及视频讲解,文档
Training/Evaluation code for HoVer-NeXt
游戏服务器框架,网络层分别用SocketAPI、Boost Asio、Libuv三种方式实现, 框架内使用共享内存,无锁队列,对象池,内存池来提高服务器性能。还包含一个不断完善的Unity 3D客户端,客户端含大量完整资源,坐骑,宠物,伙伴,装备, 这些均己实现上阵和穿戴, 并可进入副本战斗,多人玩法也己实现, 持续开发中。
2023年最新整理 c++后端开发,1000篇优秀博文,含内存,网络,架构设计,高性能,数据结构,基础组件,中间件,分布式相关
系统设计面试:内幕指南(System Design Interview: An Insider’s Guide)