Stars
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
A comprehensive toolkit for GPU Communications Libraries performance testing and data analysis.
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A tool for bandwidth measurements on NVIDIA GPUs.
Optimized primitives for collective multi-GPU communication
This is a set of simple programs that can be used to explore the features of a parallel platform.
This is an online course where you can learn and master the skill of low-level performance analysis and tuning.
Seamlessly invoke Amazon Bedrock or your custom models, enabling a smooth experience with AWS GenAI services.
A multi-platform experimentation framework written in python.
A validation and profiling tool for AI infrastructure
Contains example recipes that demonstrate how to build HPC systems using AWS services and solutions.
This repository contains HPC application best practices, specifically designed and optimized to run on AWS.
Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.
System performance analysis and characterization tool
A CLI tool to gather performance data and visualize using HTML graphs. Data from multiple collection runs can be viewed side-by-side, allowing for easy comparison of the same workload across differ…
Research and Engineering Studio (RES) is an AWS supported open source product that enables IT administrators to provide an easy-to-use web portal for scientists and engineers to run technical compu…
Dragon distributed runtime for HPC and AI applications and workflows
The Chef cookbook used to build and bootstrap AWS ParallelCluster
Scripts to collect data for collectives selection tuning