Skip to content
View cs-qyzhang's full-sized avatar

Highlights

  • Pro

Block or report cs-qyzhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Persist and reuse KV Cache to speedup your LLM.

Python 115 37 Updated Nov 10, 2025

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"

Python 387 38 Updated Apr 20, 2024

Open Source Continuous Inference Benchmarking - GB200 NVL72 vs MI355X vs B200 vs H200 vs MI325X & soon™ TPUv6e/v7/Trainium2 - DeepSeek 670B MoE, GPTOSS

Python 330 44 Updated Nov 10, 2025

Enhanced ChatGPT Clone: Features Agents, MCP, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message…

TypeScript 31,513 6,139 Updated Nov 10, 2025
Python 2,112 178 Updated Nov 4, 2025

Try the demo of WebANNS on our GitHub page!

C++ 12 Updated Jul 14, 2025

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 17,060 1,297 Updated Nov 10, 2025

Mamba SSM architecture

Python 16,382 1,485 Updated Oct 10, 2025

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 361 21 Updated Sep 15, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 3,806 297 Updated Nov 9, 2025

Query-Adaptive Vector Search

C++ 61 12 Updated Nov 3, 2025

SPy language

Python 645 37 Updated Nov 10, 2025

Universal LLM Deployment Engine with ML Compilation

Python 21,587 1,852 Updated Nov 4, 2025

Prompt Orchestration Markup Language

TypeScript 4,723 248 Updated Nov 7, 2025

Lightweight coding agent that runs in your terminal

Rust 50,197 6,228 Updated Nov 10, 2025

Microsoft Azure Traces

Jupyter Notebook 1,020 167 Updated Oct 20, 2025

[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications with Ayo

Python 48 5 Updated Aug 5, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,152 1,917 Updated Nov 1, 2025

An AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. Coze your way to AI Agent creation.

TypeScript 18,429 2,565 Updated Nov 7, 2025

🚀 The fast, Pythonic way to build MCP servers and clients

Python 20,149 1,483 Updated Nov 10, 2025

Developer-friendly, embedded retrieval engine for multimodal AI. Search More; Manage Less.

Rust 7,938 641 Updated Nov 10, 2025

This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding code links.

236 7 Updated Jul 29, 2025

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 776 54 Updated Mar 6, 2025

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 8,873 696 Updated Aug 18, 2024

Dynamic Memory Management for Serving LLMs without PagedAttention

C 434 33 Updated May 30, 2025

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 82,060 9,179 Updated Nov 10, 2025

Get started with building Fullstack Agents using Gemini 2.5 and LangGraph

Jupyter Notebook 17,288 2,944 Updated Oct 21, 2025

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

TypeScript 155,257 49,724 Updated Nov 10, 2025

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Python 7,748 634 Updated Nov 6, 2025
Next