Pinned Loading
-
waqasm86.github.io
waqasm86.github.io PublicPersonal engineering portfolio showcasing CUDA + C++ + LLM inference projects. Features production-grade distributed systems, empirical performance research, and on-device AI optimization. Built wi…
-
llcuda
llcuda PublicCUDA-accelerated LLM inference for Python with automatic server management. Zero-configuration setup, JupyterLab-ready, production-grade performance. Just install and start inferencing!
Python
-
Ubuntu-Cuda-Llama.cpp-Executable
Ubuntu-Cuda-Llama.cpp-Executable PublicPre-built llama.cpp CUDA binary for Ubuntu 22.04. No compilation required - download, extract, and run! Works with llcuda Python package for JupyterLab integration. Tested on GeForce 940M to RTX 4090.
Python 1
-
cuda-nvidia-systems-engg
cuda-nvidia-systems-engg PublicProduction-grade C++20/CUDA distributed LLM inference system with TCP networking, MPI scheduling, and content-addressed storage. Features comprehensive benchmarking (p50/p95/p99 latencies), epoll a…
C++
-
cuda-tcp-llama.cpp
cuda-tcp-llama.cpp PublicHigh-performance TCP inference gateway with epoll async I/O for CUDA-accelerated LLM serving. Binary protocol, connection pooling, streaming responses. Zero dependencies beyond POSIX and CUDA.
C++
-
cuda-llm-storage-pipeline
cuda-llm-storage-pipeline PublicContent-addressed LLM model distribution with SHA256 verification and SeaweedFS integration. Distributed storage, manifest management, LRU caching, and integrity checking for GGUF models.
C++
If the problem persists, check the GitHub status page or contact support.