- China
-
21:16
(UTC +08:00)
Highlights
- Pro
Stars
Based on Nano-vLLM, a simple replication of vLLM with self-contained paged attention and flash attention implementation
test pl ddr;run in c++; self defined memory controller for pl ddr
tinyGPU: A Predicated-SIMD processor implementation in SystemVerilog
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Release of stream-specialization software/hardware stack.
一个基于nano banana pro🍌的原生AI PPT生成应用,迈向真正的"Vibe PPT"; 支持上传任意模板图片;上传任意素材&智能解析;一句话/大纲/页面描述自动生成PPT;口头修改指定区域、一键导出 - An AI-native PPT generator based on nano banana pro🍌
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
"Paper2Slides: From Paper to Presentation in One Click"
A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
CXL-DMSim: A Full-System CXL Disaggregated Memory Simulator With Comprehensive Silicon Validation
Code for the paper “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling”
WangRongsheng / vllm-omni
Forked from vllm-project/vllm-omniA framework for efficient model inference with omni-modality models
DRAMsim3: a Cycle-accurate, Thermal-Capable DRAM Simulator
arkhadem / aim_simulator
Forked from CMU-SAFARI/ramulator2A simulator for SK hynix AiM PIM architecture based on Ramulator 2.0
Sample codes for my CUDA programming book
NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing
hplp / GPU-PIM
Forked from gem5/gem5The official repository for the gem5 computer-system architecture simulator.