Skip to content
View SiriusNEO's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report SiriusNEO

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Building the Virtuous Cycle for AI-driven LLM Systems

Python 60 11 Updated Oct 23, 2025

Contexts Optical Compression

Python 15,167 901 Updated Oct 23, 2025

How to ensure correctness and ship LLM generated kernels in PyTorch

Python 104 14 Updated Oct 22, 2025

A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention

205 4 Updated Aug 26, 2025

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 472 47 Updated Oct 23, 2025

Universal memory layer for AI Agents; Announcing OpenMemory MCP - local and secure memory management.

Python 41,654 4,466 Updated Oct 23, 2025

nanobind: tiny and efficient C++/Python bindings

C++ 3,091 257 Updated Oct 17, 2025

Ascend TileLang adapter

C++ 130 24 Updated Oct 23, 2025

The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >70% on SWE-bench verified!

Python 1,918 200 Updated Oct 21, 2025

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 112 6 Updated Oct 22, 2025

"RAG-Anything: All-in-One RAG Framework"

Python 9,394 1,082 Updated Oct 20, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,224 227 Updated Oct 23, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 3,543 276 Updated Oct 22, 2025

Fast and memory-efficient exact kmeans

Python 112 6 Updated Oct 23, 2025

Letta is the platform for building stateful agents: open AI with advanced memory that can learn and self-improve over time.

Python 18,890 1,960 Updated Oct 20, 2025

[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation

Python 527 29 Updated Sep 18, 2025

[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.

Python 198 26 Updated May 31, 2025

Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism (NIPS'25)

Python 12 Updated Oct 6, 2025

[OSDI'25] QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach

9 Updated Jun 20, 2025

Source code repository for ASPLOS '25 paper "Syno: Structured Synthesis for Neural Operators"

C++ 15 Updated Aug 31, 2025

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 3,188 328 Updated Oct 23, 2025

Run the latest vscode-server on RHEL/CentOS 7!

C 146 15 Updated Oct 23, 2025

A torch compile backend for multi-targets

Python 39 14 Updated Oct 23, 2025
Go 63 1 Updated Sep 15, 2025

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python 12,502 1,254 Updated Oct 22, 2025

Model Context Protocol Servers

TypeScript 71,107 8,491 Updated Oct 20, 2025

Open ABI and FFI for Machine Learning Systems

C++ 136 25 Updated Oct 23, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 439 100 Updated Oct 22, 2025
Next