waqasm86

waqasm86 waqasm86

56 followers · 1.5k following

Achievements

waqasm86.github.io Public

Personal engineering portfolio showcasing CUDA + C++ + LLM inference projects. Features production-grade distributed systems, empirical performance research, and on-device AI optimization. Built wi…

github-pages portfolio documentation performance cpp gpu distributed-computing

Updated Dec 30, 2025
llcuda Public

CUDA-accelerated LLM inference for Python with automatic server management. Zero-configuration setup, JupyterLab-ready, production-grade performance. Just install and start inferencing!

python machine-learning natural-language-processing ai deep-learning jupyter tensorflow

Python MIT License Updated Dec 30, 2025
Ubuntu-Cuda-Llama.cpp-Executable Public

Pre-built llama.cpp CUDA binary for Ubuntu 22.04. No compilation required - download, extract, and run! Works with llcuda Python package for JupyterLab integration. Tested on GeForce 940M to RTX 4090.

python machine-learning ai deep-learning ubuntu binary cuda

Python 1 MIT License Updated Dec 28, 2025
cuda-nvidia-systems-engg Public

Production-grade C++20/CUDA distributed LLM inference system with TCP networking, MPI scheduling, and content-addressed storage. Features comprehensive benchmarking (p50/p95/p99 latencies), epoll a…

benchmarking performance networking tcp cpp storage gpu

C++ MIT License Updated Dec 27, 2025
local-llama-cuda Public

Custom CUDA implementation for LLM inference with MPI-based distributed computing. Memory-efficient layer offloading, multi-rank coordination, and GPU optimization for constrained hardware (1GB VRAM).

cpp gpu mpi distributed-computing cuda inference memory-optimization

C++ MIT License Updated Dec 25, 2025
cuda-tcp-llama.cpp Public

High-performance TCP inference gateway with epoll async I/O for CUDA-accelerated LLM serving. Binary protocol, connection pooling, streaming responses. Zero dependencies beyond POSIX and CUDA.

networking tcp server cpp cuda inference epoll

C++ Updated Dec 23, 2025
cuda-openmpi Public

CUDA-aware OpenMPI integration for GPU-accelerated distributed computing. Multi-GPU LLM inference with MPI communication, performance benchmarking, and collective operations testing.

benchmarking cpp gpu mpi distributed-computing cuda openmpi

Cuda MIT License Updated Dec 23, 2025
cuda-llm-storage-pipeline Public

Content-addressed LLM model distribution with SHA256 verification and SeaweedFS integration. Distributed storage, manifest management, LRU caching, and integrity checking for GGUF models.

cpp storage cuda distributed-storage content-addressed seaweedfs llm

C++ Updated Dec 23, 2025
cuda-mpi-llama-scheduler Public

Distributed MPI scheduler with work-stealing algorithm for LLM inference. Percentile latency analysis (p50/p95/p99), throughput benchmarking, multi-rank load balancing, and empirical performance me…

benchmarking performance cpp scheduler mpi cuda inference

Cuda Updated Dec 23, 2025
cuda-tcp-ip Public

Updated Dec 18, 2025
Kaggle-Dropbox-HuggingFace Public

Kaggle-Dropbox-HuggingFace

Jupyter Notebook Updated Dec 16, 2025
cmake-superbuild-toolkit Public

Qt-style CMake superbuild demo: FetchContent deps, feature flags, install/export targets, CI matrix, tests, and CPack packaging.

CMake Other Updated Dec 16, 2025
windsurf-llama-cpp-mcp-bridge Public

MCP stdio server for Windsurf that routes tool calls to a local llama.cpp llama-server (GGUF), optimized for low-VRAM GPUs.

Python Updated Dec 13, 2025
Apify-Google-Gemini Public

TypeScript Updated Nov 29, 2025
Wolfram-llama.cpp Public

This is a sample project to use wolfram with llama.cpp

Updated Nov 18, 2025
AI-Data-Engineering Public

Python Updated Nov 9, 2025
warp-llama.cpp-fastmcp Public

Python Updated Oct 21, 2025
cursor-llama-mcp-bridge Public

Python Updated Oct 13, 2025
electron-win10-rewrite Public

electron-win10-rewrite

JavaScript Updated Oct 1, 2025
p2-git-windows Public

This is a git tutorial in windows using Git Bash.

Updated Sep 25, 2025
llama.cpp Public
Forked from ggml-org/llama.cpp

LLM inference in C/C++

C++ MIT License Updated Sep 25, 2025
Ubuntu-Cuda-llama.cpp Public

llama.cpp executable for ubuntu cuda-dedicated

Updated Sep 24, 2025
p1-git Public

This is my git and github tutorials repo.

Updated Sep 24, 2025
p1-coding-train Public

the coding train git and github tutorial

Updated Sep 3, 2025
Metabase-K3S-Debian12 Public

Updated Sep 2, 2025
llama-cpp-releases Public

Shell Updated Aug 17, 2025
FoundationOne-Vue-Laravel Public

Updated May 16, 2025
FreeDomain Public
Forked from DigitalPlatDev/FreeDomain

DigitalPlat FreeDomain: Free Domain For Everyone

HTML GNU Affero General Public License v3.0 Updated May 14, 2025
p1-react-ts Public

Updated Mar 31, 2025
p3-libtorch Public

Jupyter Notebook Updated Mar 17, 2025

waqasm86 waqasm86

Achievements

Achievements

waqasm86.github.io Public

Uh oh!

llcuda Public

Uh oh!

Ubuntu-Cuda-Llama.cpp-Executable Public

Uh oh!

cuda-nvidia-systems-engg Public

Uh oh!

local-llama-cuda Public

Uh oh!

cuda-tcp-llama.cpp Public

Uh oh!

cuda-openmpi Public

Uh oh!

cuda-llm-storage-pipeline Public

Uh oh!

cuda-mpi-llama-scheduler Public

Uh oh!

cuda-tcp-ip Public

Uh oh!

Kaggle-Dropbox-HuggingFace Public

Uh oh!

cmake-superbuild-toolkit Public

Uh oh!

windsurf-llama-cpp-mcp-bridge Public

Uh oh!

Apify-Google-Gemini Public

Uh oh!

Wolfram-llama.cpp Public

Uh oh!

AI-Data-Engineering Public

Uh oh!

warp-llama.cpp-fastmcp Public

Uh oh!

cursor-llama-mcp-bridge Public

Uh oh!

electron-win10-rewrite Public

Uh oh!

p2-git-windows Public

Uh oh!

llama.cpp Public

Uh oh!

Ubuntu-Cuda-llama.cpp Public

Uh oh!

p1-git Public

Uh oh!

p1-coding-train Public

Uh oh!

Metabase-K3S-Debian12 Public

Uh oh!

llama-cpp-releases Public

Uh oh!

FoundationOne-Vue-Laravel Public

Uh oh!

FreeDomain Public

Uh oh!

p1-react-ts Public

Uh oh!

p3-libtorch Public

Uh oh!