JamesTheZ

ZHENG, Zhen JamesTheZ

98 followers · 65 following

https://jamesthez.github.io/

Achievements

Highlights

Stars

d2l-ai / d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

Python 27,352 4,833 Updated Aug 18, 2024

HW-whistleblower / True-Story-of-Pangu

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,376 1,363 Updated Jul 9, 2025

microsoft / Tutel

Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4

C 938 106 Updated Nov 10, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 62,615 7,580 Updated Nov 13, 2025

Kevinstone-199898 / vllm

Python 1 Updated Oct 6, 2025

dafish-ai / NTU-Machine-learning

台湾大学李宏毅老师机器学习

Jupyter Notebook 1,126 384 Updated Jul 15, 2019

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

1,709 110 Updated Nov 10, 2025

deepseek-ai / DeepSeek-V3

Python 100,308 16,351 Updated Aug 28, 2025

udlbook / udlbook

Understanding Deep Learning - Simon J.D. Prince

Jupyter Notebook 8,493 1,956 Updated Nov 5, 2025

dropbox / gemlite

Fast low-bit matmul kernels in Triton

Python 396 29 Updated Oct 26, 2025

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

HTML 21,883 2,568 Updated Oct 19, 2025

facebookresearch / SpinQuant

Code repo for the paper "SpinQuant LLM quantization with learned rotations"

Python 346 59 Updated Feb 14, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 4,186 254 Updated Nov 17, 2025

openai / human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"

Python 3,018 422 Updated Jan 17, 2025

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Python 993 250 Updated Jul 22, 2025

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python 2,525 281 Updated Nov 17, 2025

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 445 59 Updated Oct 16, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 915 44 Updated Oct 29, 2025

ROCm / flash-attention

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python 200 68 Updated Oct 20, 2025

ROCm / bitsandbytes

Forked from bitsandbytes-foundation/bitsandbytes

8-bit CUDA functions for PyTorch

Python 68 12 Updated Sep 24, 2025

Infini-AI-Lab / MagicPIG

[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation

Python 240 16 Updated Dec 16, 2024

FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,377 583 Updated Oct 28, 2024

October2001 / Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

603 16 Updated Sep 30, 2025

bytedance / ABQ-LLM

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

C++ 238 21 Updated Sep 30, 2024

spcl / QuaRot

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 449 51 Updated Nov 26, 2024

nunchaku-tech / deepcompressor

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 697 67 Updated Aug 14, 2025

mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 779 53 Updated Mar 6, 2025

Qcompiler / MIXQ

MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction

Python 94 15 Updated Oct 29, 2024

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 872 71 Updated May 22, 2025

SqueezeAILab / LLMCompiler

[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling

Python 1,782 122 Updated Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZHENG, Zhen JamesTheZ

Achievements

Achievements

Highlights

Block or report JamesTheZ

Stars

d2l-ai / d2l-en

HW-whistleblower / True-Story-of-Pangu

microsoft / Tutel

hiyouga / LLaMA-Factory

Kevinstone-199898 / vllm

dafish-ai / NTU-Machine-learning

HuangOwen / Awesome-LLM-Compression

deepseek-ai / DeepSeek-V3

udlbook / udlbook

dropbox / gemlite

liguodongiot / llm-action

facebookresearch / SpinQuant

zhaochenyang20 / Awesome-ML-SYS-Tutorial

openai / human-eval

bigcode-project / bigcode-evaluation-harness

intel / neural-compressor

microsoft / sarathi-serve

efeslab / Nanoflow

ROCm / flash-attention

ROCm / bitsandbytes

Infini-AI-Lab / MagicPIG

FMInference / FlexLLMGen

October2001 / Awesome-KV-Cache-Compression

bytedance / ABQ-LLM

spcl / QuaRot

nunchaku-tech / deepcompressor

mit-han-lab / omniserve

Qcompiler / MIXQ

OpenGVLab / OmniQuant

SqueezeAILab / LLMCompiler