Skip to content
View DHdroid's full-sized avatar
👋
👋

Organizations

@little-piplup

Block or report DHdroid

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 2025] Multipole Attention for Efficient Long Context Reasoning

14 Updated Jun 10, 2025

[Survey] Towards Efficient Large Language Model Serving: A Survey on System-Aware KV Cache Optimization

Python 156 4 Updated Nov 10, 2025

[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)

Python 152 7 Updated Nov 21, 2025

"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai

Python 9,690 1,516 Updated Nov 26, 2025

Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.

Python 40,838 5,459 Updated Nov 27, 2025

The best ChatGPT that $100 can buy.

Python 37,662 4,616 Updated Nov 17, 2025

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI

56 2 Updated Oct 20, 2025

Nano vLLM

Python 9,305 1,145 Updated Nov 3, 2025

Common recipes to run vLLM

Jupyter Notebook 244 89 Updated Nov 26, 2025

Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2/3/GB300 NVL72 - DeepSeek 670B MoE, GPTOSS

Python 383 55 Updated Nov 27, 2025

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 736 81 Updated Apr 6, 2025

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 584 71 Updated Sep 11, 2024

An open protocol enabling communication and interoperability between opaque agentic applications.

Shell 20,834 2,127 Updated Nov 26, 2025

The evaluation framework for training-free sparse attention in LLMs

Python 106 8 Updated Oct 13, 2025

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

Python 59,709 7,314 Updated Oct 4, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,558 712 Updated Nov 27, 2025

A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨

Python 267 20 Updated Apr 26, 2024

🐳 | Dockerfiles for the RunPod container images used for our official templates.

Jupyter Notebook 211 114 Updated Nov 14, 2025

CHAI is a library for dynamic pruning of attention heads for efficient LLM inference.

Python 22 Updated Dec 11, 2024

Development repository for the Triton language and compiler

MLIR 17,685 2,411 Updated Nov 27, 2025
Python 569 49 Updated Oct 29, 2024

VQVAEs, GumbelSoftmaxes and friends

Jupyter Notebook 617 50 Updated Nov 20, 2021

[SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference

Python 77 16 Updated Nov 8, 2025

[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

Python 132 9 Updated Dec 4, 2024

[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2

Python 264 32 Updated Aug 28, 2025

QLoRA: Efficient Finetuning of Quantized LLMs

Jupyter Notebook 10,769 866 Updated Jun 10, 2024

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 667 49 Updated Apr 25, 2025

Awesome LLM compression research papers and tools.

1,720 111 Updated Nov 10, 2025
Next