-
Mangoboost
- Seoul
-
15:38
(UTC +09:00)
Stars
Framework providing operating system abstractions and a range of shared networking and memory services for common modern heterogeneous platforms.
Perplexity open source garden for inference technology
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Supercharge Your LLM with the Fastest KV Cache Layer
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
LLaMA 2 implemented from scratch in PyTorch
[Deprecated] ⭐️ TT-NN Compiler for PyTorch 2 ⭐️ Enables running PyTorch models on Tenstorrent hardware using eager or compile path
A validation and profiling tool for AI infrastructure
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
Merlin Models is a collection of deep learning recommender system model reference implementations
A LogGOPS (LogP, LogGP, LogGPS) Simulator and Simulation Framework
DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations.
Fully open reproduction of DeepSeek-R1
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Device Metrics Exporter exports metrics from AMD devices (GPUs) to collectors like Prometheus.
SGLang is a fast serving framework for large language models and vision language models.
An extremely fast Python package and project manager, written in Rust.
To develop Arm Cortex-M0 based SoCs, from creating high-level functional specifications to design, implementation and testing on FPGA platforms using standard hardware description and software prog…
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/