Skip to content
View stephenmsachs's full-sized avatar
💭
Vacation until 8/15
💭
Vacation until 8/15
  • Berlin
  • 02:44 (UTC +01:00)

Block or report stephenmsachs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.

C++ 122 15 Updated Nov 15, 2023

A comprehensive toolkit for GPU Communications Libraries performance testing and data analysis.

Python 4 Updated Oct 10, 2025

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 377 37 Updated Oct 16, 2025

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 39,783 6,893 Updated Nov 12, 2025

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 564 64 Updated Apr 15, 2025

Optimized primitives for collective multi-GPU communication

C++ 4,217 1,064 Updated Nov 10, 2025

CloudAI Benchmark Framework

Python 73 35 Updated Nov 10, 2025

NCCL Tests

Cuda 1,330 328 Updated Nov 3, 2025

This is a set of simple programs that can be used to explore the features of a parallel platform.

C 464 114 Updated Aug 28, 2025

Open Fabric Interfaces

C 721 452 Updated Nov 11, 2025

This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

C++ 3,336 326 Updated Nov 11, 2025

Seamlessly invoke Amazon Bedrock or your custom models, enabling a smooth experience with AWS GenAI services.

TypeScript 93 33 Updated Sep 30, 2025

A multi-platform experimentation framework written in python.

Python 62 39 Updated Nov 11, 2025

A validation and profiling tool for AI infrastructure

Python 348 76 Updated Nov 10, 2025

Contains example recipes that demonstrate how to build HPC systems using AWS services and solutions.

Shell 85 34 Updated Oct 16, 2025

ldd as a tree

C 2,738 62 Updated Jun 21, 2024

This repository contains HPC application best practices, specifically designed and optimized to run on AWS.

Shell 18 2 Updated Jun 27, 2025

Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.

Shell 366 150 Updated Nov 12, 2025

System performance analysis and characterization tool

Go 411 52 Updated Nov 9, 2025

A CLI tool to gather performance data and visualize using HTML graphs. Data from multiple collection runs can be viewed side-by-side, allowing for easy comparison of the same workload across differ…

Rust 143 28 Updated Nov 11, 2025

Research and Engineering Studio (RES) is an AWS supported open source product that enables IT administrators to provide an easy-to-use web portal for scientists and engineers to run technical compu…

Python 103 27 Updated Sep 25, 2025

Dragon distributed runtime for HPC and AI applications and workflows

Python 86 9 Updated Nov 7, 2025

The Chef cookbook used to build and bootstrap AWS ParallelCluster

Ruby 111 109 Updated Nov 3, 2025

Scripts to collect data for collectives selection tuning

Python 8 10 Updated Jan 25, 2024

FFTW code optimized for AMD based processors

C 58 16 Updated May 8, 2025
Next