Skip to content
View seb-sep's full-sized avatar

Block or report seb-sep

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
Cuda 123 16 Updated Oct 22, 2025

Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 1,871 75 Updated Sep 10, 2025

Tile primitives for speedy kernels

Cuda 2,877 194 Updated Nov 9, 2025

Fast low-bit matmul kernels in Triton

Python 393 29 Updated Oct 26, 2025

lsblk in go for apple computers

Go 10 Updated Nov 3, 2024

A categorized list of C++ resources.

5,137 524 Updated Nov 10, 2025

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,732 272 Updated Jul 18, 2025

CUDA on non-NVIDIA GPUs

Rust 13,411 849 Updated Nov 10, 2025

SPIRV-Cross is a practical tool and library for performing reflection on SPIR-V and disassembling SPIR-V back to high level languages.

GLSL 2,311 616 Updated Nov 7, 2025

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 2,859 112 Updated Nov 10, 2025

Whisper with Medusa heads

Python 863 53 Updated Aug 6, 2025

A high-performance, zero-overhead, extensible Python compiler with built-in NumPy support

Python 16,014 558 Updated Nov 10, 2025

convert images, video to ascii!

Zig 489 24 Updated Sep 2, 2025

FlashAttention (Metal Port)

Swift 549 34 Updated Sep 22, 2024

Everything we actually know about the Apple Neural Engine (ANE)

2,309 85 Updated Oct 21, 2025

Apple GPU microarchitecture

Metal 559 27 Updated Sep 22, 2024

LLM101n: Let's build a Storyteller

35,504 1,933 Updated Aug 1, 2024

Efficient Triton Kernels for LLM Training

Python 5,818 428 Updated Nov 8, 2025

LLM training in simple, raw C/Metal Shading Language

Cuda 60 4 Updated Apr 24, 2024

🚀 Efficient implementations of state-of-the-art linear attention models

Python 3,806 298 Updated Nov 9, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,087 733 Updated Oct 31, 2025

Dolphin for iOS, reborn

C++ 413 57 Updated Oct 23, 2025

A port of https://www.github.com/n64decomp/sm64 for modern devices.

C 1,135 176 Updated Nov 15, 2024

ONNX Serving is a project written with C++ to serve onnx-mlir compiled models with GRPC and other protocols.Benefiting from C++ implementation, ONNX Serving has very low latency overhead and high t…

C++ 25 4 Updated Sep 17, 2025

A Super Mario 64 decompilation, brought to you by a bunch of clever folks.

C 8,265 1,490 Updated Feb 4, 2024

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,942 149 Updated Nov 11, 2025

seqax = sequence modeling + JAX

Python 168 16 Updated Jul 23, 2025

Official implementation of Half-Quadratic Quantization (HQQ)

Python 889 87 Updated Oct 24, 2025

A CocoaPods plugin to add SPM dependencies to CocoaPods-based projects

Ruby 86 13 Updated Sep 7, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,653 4,616 Updated Nov 8, 2025
Next