Skip to content
View codinggosu's full-sized avatar
  • Mangoboost
  • Seoul
  • 15:38 (UTC +09:00)

Block or report codinggosu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Framework providing operating system abstractions and a range of shared networking and memory services for common modern heterogeneous platforms.

SystemVerilog 317 91 Updated Dec 22, 2025

Perplexity open source garden for inference technology

Rust 310 26 Updated Dec 9, 2025

Linux Cross-Memory Attach

C 97 38 Updated Sep 11, 2024

Modular RDMA Interface

C++ 67 15 Updated Dec 24, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,542 979 Updated Dec 13, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,993 778 Updated Dec 23, 2025
HTML 227 47 Updated Dec 5, 2025
C++ 91 30 Updated Aug 27, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 6,422 814 Updated Dec 24, 2025

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python 395 38 Updated Aug 13, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,161 395 Updated Jul 11, 2024

LLaMA 2 implemented from scratch in PyTorch

Python 363 68 Updated Sep 25, 2023

[Deprecated] ⭐️ TT-NN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path

Python 61 25 Updated Dec 18, 2025

A validation and profiling tool for AI infrastructure

Python 352 80 Updated Dec 21, 2025

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,300 180 Updated Dec 17, 2025

Merlin Models is a collection of deep learning recommender system model reference implementations

Python 293 54 Updated May 4, 2024

A LogGOPS (LogP, LogGP, LogGPS) Simulator and Simulation Framework

C 14 7 Updated Aug 20, 2024

DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations.

Python 52 15 Updated Dec 4, 2025

Fully open reproduction of DeepSeek-R1

Python 25,749 2,407 Updated Nov 24, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,470 479 Updated Dec 24, 2025

Device Metrics Exporter exports metrics from AMD devices (GPUs) to collectors like Prometheus.

Go 38 26 Updated Dec 20, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 21,929 3,853 Updated Dec 24, 2025

An extremely fast Python package and project manager, written in Rust.

Rust 75,539 2,379 Updated Dec 23, 2025

To develop Arm Cortex-M0 based SoCs, from creating high-level functional specifications to design, implementation and testing on FPGA platforms using standard hardware description and software prog…

Verilog 35 7 Updated Dec 24, 2020

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,506 696 Updated Dec 23, 2025
C 639 56 Updated Dec 18, 2024
Next