Skip to content

[CI][CUDA][Distributed] test_ring_flex_attention failed on 8xB200 Runner #162820

@nWEIdia

Description

@nWEIdia

🐛 Describe the bug

Tracked in umbrella #162178
Job link: https://github.com/pytorch/pytorch/actions/runs/17660052730/job/50193312091

Failure message:

2025-09-12T05:47:07.8805304Z expect_out, expect_lse = compiled_flex_attention( 2025-09-12T05:47:07.8805570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 841, in compile_wrapper 2025-09-12T05:47:07.8805776Z raise e.with_traceback(None) from e.__cause__ # User compiler error 2025-09-12T05:47:07.8806030Z torch._dynamo.exc.Unsupported: Attempted to call function marked as skipped 2025-09-12T05:47:07.8806214Z Explanation: Dynamo does not know how to trace the Python builtin _warnings.warn. 2025-09-12T05:47:07.8806549Z Hint: If you are attempting to call a logging function (e.g. _warnings.warn), you can try adding it to torch._dynamo.config.reorderable_logging_functions. 2025-09-12T05:47:07.8806723Z Hint: Please file an issue on GitHub so the PyTorch team can add support for it. 2025-09-12T05:47:07.8806754Z 2025-09-12T05:47:07.8806953Z Developer debug context: module: _warnings, qualname: warn, skip reason: <missing reason> 2025-09-12T05:47:07.8806957Z 2025-09-12T05:47:07.8807256Z For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html 2025-09-12T05:47:07.8807260Z 2025-09-12T05:47:07.8807348Z from user code: 2025-09-12T05:47:07.8807707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py", line 1613, in flex_attention 2025-09-12T05:47:07.8807824Z _warn_once( 2025-09-12T05:47:07.8808100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/attention/flex_attention.py", line 65, in _warn_once 2025-09-12T05:47:07.8808250Z warnings.warn(message, category, stacklevel=2) 2025-09-12T05:47:07.8808254Z 2025-09-12T05:47:07.8808676Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-09-12T05:47:07.8808680Z 2025-09-12T05:47:07.8808683Z 2025-09-12T05:47:07.8808831Z To execute this test, run the following from the base repo dir: 2025-09-12T05:47:07.8809086Z python test/distributed/tensor/test_attention.py RingFlexAttentionTest.test_ring_flex_attention 2025-09-12T05:47:07.8809090Z 2025-09-12T05:47:07.8809286Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-09-12T05:47:07.8809421Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-09-12T05:47:07.8809584Z ================== 1 failed, 5 deselected, 2 rerun in 40.40s ===================

Versions

TOT

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @dcci @seemethere @malfet @pytorch/pytorch-dev-infra @mruberry @chauhang @penguinwu @zou3519 @ydwu4 @bdhirsh @Chillee @drisspg @yanboliang @BoyuanFeng

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: ciRelated to continuous integrationmodule: flex attentionmodule: higher order operatorstorch.cond and similarmodule: pt2-dispatcherPT2 dispatcher-related issues (e.g., aotdispatch, functionalization, faketensor, custom-op,module: testsIssues related to tests (not the torch.testing module)oncall: distributedAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions