Skip to content
View dzwduan's full-sized avatar
🏖️
Never give up
🏖️
Never give up
  • Institute of Computing Technology, CAS

Block or report dzwduan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Low overhead tracing library and trace visualizer for pipelined CUDA kernels

C 93 5 Updated Nov 11, 2025

Tips for Writing a Research Paper using LaTeX

TeX 3,619 403 Updated May 4, 2023

An Eclipse 4 RCP based GUI to interact with SystemC simulators

Java 15 4 Updated Sep 22, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 48,162 3,950 Updated Nov 10, 2025

A tiny deep learning training framework implemented from scratch in C++ that follows PyTorch's API.

C++ 123 23 Updated Nov 1, 2025

Open ABI and FFI for Machine Learning Systems

C++ 171 34 Updated Nov 11, 2025

NanoGPT (124M) in 3 minutes

Python 3,792 495 Updated Nov 6, 2025

GPU Kernels

Cuda 206 18 Updated Apr 27, 2025

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

400 39 Updated Aug 2, 2025

Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors

JavaScript 36,826 1,820 Updated Nov 10, 2025

JAX support for tvm-ffi abi

C++ 16 2 Updated Nov 6, 2025

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2,223 283 Updated Nov 11, 2025

A simple, lightweight PowerShell script to remove pre-installed apps, disable telemetry, as well as perform various other changes to customize, declutter and improve your Windows experience. Win11D…

PowerShell 32,783 1,283 Updated Nov 10, 2025

A step-by-step tutorial that allows beginners to write their own autonomous vehicle program from scratch using a simple starter kit. Dora-drives makes learning autonomous vehicle systems faster and…

Python 69 12 Updated Jun 21, 2024

一个分析大型语言模型系统提示词的研究项目

69 7 Updated Oct 13, 2025

The best ChatGPT that $100 can buy.

Python 36,399 4,349 Updated Nov 5, 2025
Python 35 3 Updated Oct 12, 2025

a simple minimal riscv32imac virtual machine, support Linux MMU+SMP booting.

C 1 Updated Dec 22, 2024

A light llama-like llm inference framework based on the triton kernel.

Python 161 22 Updated Sep 20, 2025

Recommend new arxiv papers of your interest daily according to your Zotero libarary.

Python 3,975 3,476 Updated Aug 16, 2025

LeetGPU Solutions

Python 79 5 Updated Oct 9, 2025

A Heterogeneous GPU Platform for Chipyard SoC

Scala 36 1 Updated Nov 11, 2025

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 967 99 Updated Dec 30, 2024

Claude Code Comprehensive Guide

1,945 212 Updated Nov 6, 2025

This repository provides an HLS-based implementation of Tiny-LLAMA and Llama 2B. We have included detailed host files, configuration files, and the src directory, as well as a ready-to-run bitstrea…

6 1 Updated Sep 19, 2025

A SystemVerilog language server based on the Slang library.

C++ 58 8 Updated Nov 11, 2025

记录我在cs336学习时的笔记和作业

Python 181 1 Updated Nov 9, 2025

vortex backend

C++ 4 Updated Oct 12, 2025

Fast CUDA matrix multiplication from scratch

Cuda 936 139 Updated Sep 2, 2025
Next