- menlo park
-
15:58
(UTC -08:00) - in/lucca-bertoncini
- @lbz____
Stars
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 17+ clouds, or on-prem).
RDMA and SHARP plugins for nccl library
NVSentinel is a cross-platform fault remediation service designed to rapidly remediate runtime node-level issues in GPU-accelerated computing environments
Distribute and run AI workloads on Kubernetes magically in Python, like PyTorch for ML infra.
Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2 - DeepSeek 670B MoE, GPTOSS
Hydra is a framework for elegantly configuring complex applications
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data generation.
A Slurm cluster using docker-compose
An unofficial cuda assembler, for all generations of SASS, hopefully :)
A modern Python application packaging and distribution tool
Optimized primitives for collective multi-GPU communication
~2000 Elo Python Chess Engine that implements: Negamax, PeSTO’s Evaluation, Null Move, Quiescence Search, Lazy SMP.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Clusterscope is a CLI and python library to extract information from HPC Clusters and Jobs.
GPU Node ID (gni): hashes all GPU IDs in a node: `hash(GPU₀ + GPU₁ + ... + GPUₙ)`
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
lm-sensor and psutil alternatives for python from scrach
Python 3.8+ toolbox for submitting jobs to Slurm