Skip to content
View Omkar-Kakade-Github's full-sized avatar

Block or report Omkar-Kakade-Github

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Fastest kernels written from scratch

Cuda 519 61 Updated Sep 18, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 4,605 386 Updated Jan 11, 2026

All-in-one guide to getting a tech job abroad 🌎

4,213 437 Updated Oct 6, 2025

Directory of Fortran codes on GitHub, arranged by topic

369 70 Updated Jan 10, 2026

Supercomputing @ GT has compiled a list of organizations that offer internships and experiences in HPC and applications of HPC.

83 5 Updated Sep 2, 2025

Performance-Optimized AI Inference on Your GPUs. Unlock it by selecting and tuning the optimal inference engine for your model.

Python 4,363 443 Updated Jan 9, 2026

Lightning fast C++/CUDA neural network framework

C++ 4,381 535 Updated Dec 14, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 6,672 855 Updated Jan 11, 2026

OpenHPC Integration, Packaging, and Test Repo

C 958 203 Updated Jan 11, 2026

Computes spectral energy distributions from radiatively inefficient accretion flows around black holes. GPU-accelerated, CUDA (alpha, unstable)

C 2 2 Updated Nov 12, 2023

grmonty: relativistic Monte Carlo code

C 52 14 Updated Nov 3, 2024

A collection of (mostly) technical things every software developer should know about

97,460 8,625 Updated Dec 29, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,306 916 Updated Jan 7, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,052 792 Updated Jan 6, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 8,875 1,056 Updated Dec 29, 2025

It is an LLM-based AI agent, which can write correct and efficient gpu kernels automatically.

Python 54 11 Updated Jan 9, 2026

Implementation of high-performance diffusion transformer from scratch in CUDA/C++

Jupyter Notebook 7 Updated Feb 16, 2025

An evolving how-to guide for securing a Linux server.

24,331 1,564 Updated Oct 19, 2024

Invoicing, Time tracking, File reconciliation, Storage, Financial Overview & your own Assistant made for Freelancers

TypeScript 13,544 1,289 Updated Jan 11, 2026

"Context engineering is the delicate art and science of filling the context window with just the right information for the next step." — Andrej Karpathy. A frontier, first-principles handbook inspi…

Python 8,169 922 Updated Nov 15, 2025

Quickly render fractals in CUDA

Python 65 5 Updated Jul 26, 2025

Samples of good AI generated CUDA kernels

Python 99 10 Updated May 30, 2025

Collection of utilities for CUDA programming

Cuda 15 2 Updated Aug 4, 2025

An Unreal Engine editor plugin for deep asset analysis to maintain pipeline integrity.

C++ 29 1 Updated Jul 29, 2025

This repository contains a curated collection of 300+ case studies from over 80 companies, detailing practical applications and insights into machine learning (ML) system design. The contents are o…

8,010 1,135 Updated Aug 5, 2025

Deep learning at the speed of light.

Rust 2,682 183 Updated Jan 12, 2026

A curated list of engineering blogs

Ruby 36,783 1,931 Updated Aug 21, 2024

List of open-source alternatives to everyday SaaS products.

7,905 316 Updated Nov 8, 2024
Next