Skip to content

Conversation

Abdennacer-Badaoui
Copy link
Member

I'm working on adding DeepSeek-V3 ONNX export support.

The model has a special attention architecture where keys and values use different head dimensions - keys combine RoPE and NoPE dimensions (qk_rope_head_dim + qk_nope_head_dim) while values use a separate v_head_dim. This differs from standard transformers where keys and values share generally the same head dimensions.

The DeepSeekV3DummyPastKeyValuesGenerator creates properly shaped dummy tensors for ONNX export.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@IlyasMoutawwakil IlyasMoutawwakil merged commit 0342fd1 into main Sep 2, 2025
10 of 14 checks passed
@IlyasMoutawwakil IlyasMoutawwakil deleted the add-deepseekv3-dummypastkeyvaluesgenerator branch September 2, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants