Skip to content
View xyfgemini's full-sized avatar
🎯
Resilience
🎯
Resilience
  • China
  • 21:16 (UTC +08:00)

Highlights

  • Pro

Block or report xyfgemini

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A repo for llm on ncnn

C++ 166 20 Updated Dec 31, 2025

Based on Nano-vLLM, a simple replication of vLLM with self-contained paged attention and flash attention implementation

Python 129 8 Updated Dec 31, 2025

test pl ddr;run in c++; self defined memory controller for pl ddr

SystemVerilog 3 Updated Jan 15, 2025

Cookbook of SGLang - Recipe

JavaScript 48 9 Updated Jan 2, 2026

tinyGPU: A Predicated-SIMD processor implementation in SystemVerilog

SystemVerilog 55 13 Updated Jul 14, 2021

Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective

11 Updated Jul 15, 2025

Release of stream-specialization software/hardware stack.

Python 120 25 Updated May 5, 2023

一个基于nano banana pro🍌的原生AI PPT生成应用,迈向真正的"Vibe PPT"; 支持上传任意模板图片;上传任意素材&智能解析;一句话/大纲/页面描述自动生成PPT;口头修改指定区域、一键导出 - An AI-native PPT generator based on nano banana pro🍌

Python 8,020 869 Updated Jan 2, 2026

Premium Email Blog

9 1 Updated Jun 13, 2024
C++ 68 30 Updated Jun 7, 2017

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,719 88 Updated Dec 20, 2025

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 502 24 Updated Dec 23, 2025

"Paper2Slides: From Paper to Presentation in One Click"

Python 2,620 351 Updated Dec 31, 2025

A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

TypeScript 1,228 60 Updated Dec 30, 2025
Python 11 Updated Aug 16, 2025

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 799 56 Updated Mar 6, 2025

CXL-DMSim: A Full-System CXL Disaggregated Memory Simulator With Comprehensive Silicon Validation

C++ 116 29 Updated Oct 22, 2025

Chisel examples and code snippets

Tcl 265 87 Updated Aug 1, 2022

Preview Code for Continuum Paper

Python 19 3 Updated Dec 8, 2025

Code for the paper “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling”

Python 98 2 Updated Jan 2, 2026

A framework for efficient model inference with omni-modality models

Python 1 Updated Dec 2, 2025

RISC-V SystemC-TLM simulator

C 335 82 Updated Nov 8, 2025

DRAMsim3: a Cycle-accurate, Thermal-Capable DRAM Simulator

C++ 435 181 Updated Aug 3, 2024

A simulator for SK hynix AiM PIM architecture based on Ramulator 2.0

C++ 52 10 Updated Jul 22, 2025

Sample codes for my CUDA programming book

Cuda 1,965 380 Updated Dec 14, 2025

The copy of materials from UPMEM website.

1 Updated May 28, 2025

NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing

Jupyter Notebook 108 28 Updated Jun 19, 2024

Processing-In-Memory (PIM) Simulator

C++ 216 65 Updated Dec 12, 2024

The official repository for the gem5 computer-system architecture simulator.

C++ 2 Updated Sep 22, 2025
Next