Skip to content
View SeaOfOcean's full-sized avatar

Block or report SeaOfOcean

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,874 305 Updated Mar 10, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,867 739 Updated Oct 15, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,851 897 Updated Sep 30, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,930 286 Updated May 15, 2025

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Python 1,823 134 Updated Jan 17, 2025
Python 40 3 Updated Jun 5, 2024

DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.

Python 1,891 133 Updated Dec 6, 2024
Python 5 Updated Oct 20, 2024

A flexible and efficient training framework for large-scale alignment tasks

Python 437 36 Updated Oct 23, 2025

BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray

Jupyter Notebook 2,685 733 Updated Oct 14, 2025

PyTorch distributed training acceleration framework

Python 53 9 Updated Aug 13, 2025

Efficient and easy multi-instance LLM serving

Python 506 42 Updated Sep 3, 2025

TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.

C++ 97 9 Updated Apr 22, 2023

Fast and memory-efficient exact attention

Python 20,414 2,125 Updated Nov 5, 2025

DaCe - Data Centric Parallel Programming

Python 557 145 Updated Nov 8, 2025

Research and development for optimizing transformers

Python 131 17 Updated Feb 16, 2021

Development repository for the Triton language and compiler

MLIR 17,508 2,368 Updated Nov 9, 2025
Python 9 Updated Oct 10, 2022

A C++ standalone library for machine learning

C++ 5,418 501 Updated Oct 11, 2025

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 3,660 679 Updated Nov 9, 2025

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

Python 1,066 129 Updated Apr 17, 2024

The hacker's browser.

JavaScript 25,590 2,534 Updated Nov 5, 2025

EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit

Python 2,172 257 Updated Nov 27, 2024

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Python 2,538 249 Updated Apr 24, 2024

A framework for large scale recommendation algorithms.

Python 2,175 366 Updated Nov 7, 2025

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Python 270 49 Updated Mar 31, 2023

FastNN provides distributed training examples that use EPL.

Python 84 19 Updated Mar 11, 2022

An Industrial Graph Neural Network Framework

C++ 1,331 267 Updated Jul 4, 2025

GPU-scheduler-for-deep-learning

C++ 210 36 Updated Nov 5, 2020

The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integra…

Python 22,878 4,976 Updated Nov 8, 2025
Next