[inductor] verify determinism with inductor benchmark script #164904

shunting314 · 2025-10-08T00:43:50Z

Stack from ghstack (oldest at bottom):

Verify the deterministic mode with torch.compile benchmark scripts.

Here is what my testing script does (pasted in the end):

run a model in default mode, save it's result
run the model again in default mode, but distort the benchmarking results. Compare it with the saved result.
Do the above again in deterministic mode.

I tried to test a few modes

BertForMaskedLM and GoogleFnet: I can repro the numeric change by distorting the benchnmark result in the default mode. The non-determinism is gone in the deterministic mode
DistillGPT2: I can not repro the numeric change by distorting the benchmarking result in the default mode. It does not surprise me much. Reduction order change does not always cause numeric change.

model=GoogleFnet

export TORCHINDUCTOR_WRITE_ARE_DETERMINISTIC_ALGORITHMS_ENABLED=0
export TORCHINDUCTOR_FORCE_DISABLE_CACHES=1  # disable autotune cache
export TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE=0
export TORCHINDUCTOR_FX_GRAPH_CACHE=0
export TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_shunting/
export TORCHINDUCTOR_BENCHMARK_KERNEL=1
export TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1
export INDUCTOR_TEST_DISABLE_FRESH_CACHE=1


# Non deterministic mode
# --float32 rather than --amp to make it easier to repro non-deterministic
echo "Save results for non-deterministic mode"
python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --save-model-outputs-to=/tmp/saved-non-deterministic.pkl

echo "Compare results with distorted benchmarking in non-deterministic mode"
TORCHINDUCTOR_DISTORT_BENCHMARKING_RESULT=inverse python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --compare-model-outputs-with=/tmp/saved-non-deterministic.pkl

echo "Save results for deterministic mode"
TORCHINDUCTOR_DETERMINISTIC=1 python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --save-model-outputs-to=/tmp/saved-deterministic.pkl

echo "Compare results with distorted benchmarking in deterministic mode"
TORCHINDUCTOR_DETERMINISTIC=1 TORCHINDUCTOR_DISTORT_BENCHMARKING_RESULT=inverse python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --compare-model-outputs-with=/tmp/saved-deterministic.pkl

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela

[ghstack-poisoned]

pytorch-bot · 2025-10-08T00:43:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164904

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Macos CI runners unavailable

❌ 1 New Failure, 1 Pending

As of commit 496659f with merge base 50c338c ():

NEW FAILURE - The following job has failed:

linux-aarch64 / linux-jammy-aarch64-py3.10 / build (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 72b9f3c Pull Request resolved: #164904

v0i0 · 2025-10-08T16:19:15Z

torch/_inductor/compile_fx.py

            )

+    if config.deterministic:
+        torch.backends.cudnn.deterministic = True


without knowing much about when this code executed - i feel like this should be reset somehow when we are done with it? or are we documenting that calling compile w/ deterministic=true will permanently change these?

I think ideally when a user decides to enable deterministic mode, they should already set these flags accordingly. The settings here is just to make sure if a user didn't do that, we will do that.

Not reseting has the benefit that if some graph falls back to eager, they can still be run deterministically.

torch/_inductor/runtime/benchmarking.py

Verify the deterministic mode with torch.compile benchmark scripts. Here is what my testing script does (pasted in the end): - run a model in default mode, save it's result - run the model again in default mode, but distort the benchmarking results. Compare it with the saved result. - Do the above again in deterministic mode. I tried to test a few modes - BertForMaskedLM and GoogleFnet: I can repro the numeric change by distorting the benchnmark result in the default mode. The non-determinism is gone in the deterministic mode - DistillGPT2: I can not repro the numeric change by distorting the benchmarking result in the default mode. It does not surprise me much. Reduction order change does not always cause numeric change. ``` model=GoogleFnet export TORCHINDUCTOR_WRITE_ARE_DETERMINISTIC_ALGORITHMS_ENABLED=0 export TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 # disable autotune cache export TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE=0 export TORCHINDUCTOR_FX_GRAPH_CACHE=0 export TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_shunting/ export TORCHINDUCTOR_BENCHMARK_KERNEL=1 export TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 export INDUCTOR_TEST_DISABLE_FRESH_CACHE=1 # Non deterministic mode # --float32 rather than --amp to make it easier to repro non-deterministic echo "Save results for non-deterministic mode" python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --save-model-outputs-to=/tmp/saved-non-deterministic.pkl echo "Compare results with distorted benchmarking in non-deterministic mode" TORCHINDUCTOR_DISTORT_BENCHMARKING_RESULT=inverse python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --compare-model-outputs-with=/tmp/saved-non-deterministic.pkl echo "Save results for deterministic mode" TORCHINDUCTOR_DETERMINISTIC=1 python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --save-model-outputs-to=/tmp/saved-deterministic.pkl echo "Compare results with distorted benchmarking in deterministic mode" TORCHINDUCTOR_DETERMINISTIC=1 TORCHINDUCTOR_DISTORT_BENCHMARKING_RESULT=inverse python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --compare-model-outputs-with=/tmp/saved-deterministic.pkl ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela [ghstack-poisoned]

pytorchmergebot · 2025-10-08T23:42:09Z

Starting merge as part of PR stack under #164905

shunting314 · 2025-10-09T17:38:34Z

@pytorchbot merge

pytorchmergebot · 2025-10-09T17:40:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Verify the deterministic mode with torch.compile benchmark scripts. Here is what my testing script does (pasted in the end): - run a model in default mode, save it's result - run the model again in default mode, but distort the benchmarking results. Compare it with the saved result. - Do the above again in deterministic mode. I tried to test a few modes - BertForMaskedLM and GoogleFnet: I can repro the numeric change by distorting the benchnmark result in the default mode. The non-determinism is gone in the deterministic mode - DistillGPT2: I can not repro the numeric change by distorting the benchmarking result in the default mode. It does not surprise me much. Reduction order change does not always cause numeric change. ``` model=GoogleFnet export TORCHINDUCTOR_WRITE_ARE_DETERMINISTIC_ALGORITHMS_ENABLED=0 export TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 # disable autotune cache export TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE=0 export TORCHINDUCTOR_FX_GRAPH_CACHE=0 export TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_shunting/ export TORCHINDUCTOR_BENCHMARK_KERNEL=1 export TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 export INDUCTOR_TEST_DISABLE_FRESH_CACHE=1 # Non deterministic mode # --float32 rather than --amp to make it easier to repro non-deterministic echo "Save results for non-deterministic mode" python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --save-model-outputs-to=/tmp/saved-non-deterministic.pkl echo "Compare results with distorted benchmarking in non-deterministic mode" TORCHINDUCTOR_DISTORT_BENCHMARKING_RESULT=inverse python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --compare-model-outputs-with=/tmp/saved-non-deterministic.pkl echo "Save results for deterministic mode" TORCHINDUCTOR_DETERMINISTIC=1 python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --save-model-outputs-to=/tmp/saved-deterministic.pkl echo "Compare results with distorted benchmarking in deterministic mode" TORCHINDUCTOR_DETERMINISTIC=1 TORCHINDUCTOR_DISTORT_BENCHMARKING_RESULT=inverse python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --compare-model-outputs-with=/tmp/saved-deterministic.pkl ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela [ghstack-poisoned]

pytorchmergebot · 2025-10-09T18:28:12Z

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team

Raised by workflow job

pytorchmergebot · 2025-10-09T23:55:07Z

Starting merge as part of PR stack under #164905

… inductor (#164905) Previously when torch.are_deterministic_algorithms_enabled() is True Inductor will - skip autotuning pointwise kernels - pick a fixed (and quite arbitrary) config for reduction This PR change the behavior to - for pointwise kernels, we still do autotuning - for reduction kernels, we use the recent added heuristic to pick a config Pull Request resolved: #164905 Approved by: https://github.com/jansel, https://github.com/v0i0 ghstack dependencies: #164801, #164532, #164904

huydhn · 2025-10-10T06:21:23Z

@pytorchbot revert -m 'Sorry for reverting your PR but there seems to be some failed vLLM failures coming out of this' -c nosignal

pytorchmergebot · 2025-10-10T06:22:53Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

…164904)" This reverts commit a3c7006. Reverted #164904 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but there seems to be some failed vLLM failures coming out of this ([comment](#164904 (comment)))

pytorchmergebot · 2025-10-10T06:23:11Z

@shunting314 your PR has been successfully reverted.

Verify the deterministic mode with torch.compile benchmark scripts. Here is what my testing script does (pasted in the end): - run a model in default mode, save it's result - run the model again in default mode, but distort the benchmarking results. Compare it with the saved result. - Do the above again in deterministic mode. I tried to test a few modes - BertForMaskedLM and GoogleFnet: I can repro the numeric change by distorting the benchnmark result in the default mode. The non-determinism is gone in the deterministic mode - DistillGPT2: I can not repro the numeric change by distorting the benchmarking result in the default mode. It does not surprise me much. Reduction order change does not always cause numeric change. ``` model=GoogleFnet export TORCHINDUCTOR_WRITE_ARE_DETERMINISTIC_ALGORITHMS_ENABLED=0 export TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 # disable autotune cache export TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE=0 export TORCHINDUCTOR_FX_GRAPH_CACHE=0 export TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_shunting/ export TORCHINDUCTOR_BENCHMARK_KERNEL=1 export TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 export INDUCTOR_TEST_DISABLE_FRESH_CACHE=1 # Non deterministic mode # --float32 rather than --amp to make it easier to repro non-deterministic echo "Save results for non-deterministic mode" python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --save-model-outputs-to=/tmp/saved-non-deterministic.pkl echo "Compare results with distorted benchmarking in non-deterministic mode" TORCHINDUCTOR_DISTORT_BENCHMARKING_RESULT=inverse python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --compare-model-outputs-with=/tmp/saved-non-deterministic.pkl echo "Save results for deterministic mode" TORCHINDUCTOR_DETERMINISTIC=1 python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --save-model-outputs-to=/tmp/saved-deterministic.pkl echo "Compare results with distorted benchmarking in deterministic mode" TORCHINDUCTOR_DETERMINISTIC=1 TORCHINDUCTOR_DISTORT_BENCHMARKING_RESULT=inverse python benchmarks/dynamo/huggingface.py --backend inductor --float32 --accuracy --only $model --training --disable-cudagraphs --compare-model-outputs-with=/tmp/saved-deterministic.pkl ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Lucaskabela [ghstack-poisoned]

[inductor] verify determinism with inductor benchmark script

be735f7

[ghstack-poisoned]

This was referenced Oct 7, 2025

[inductor] don't tune xblock for reduction #164801

Closed

[inductor] ban benchmarking by default in deterministic mode #164532

Closed

pytorch-bot bot added ciflow/inductor module: dynamo module: inductor labels Oct 8, 2025

shunting314 added a commit that referenced this pull request Oct 8, 2025

[inductor] verify determinism with inductor benchmark script

425675d

ghstack-source-id: 72b9f3c Pull Request resolved: #164904

shunting314 requested review from eellison, jansel and v0i0 October 8, 2025 00:50

shunting314 mentioned this pull request Oct 8, 2025

[inductor][eazy] change how torch.use_deterministic_algorithms affect inductor #164905

Open

jansel approved these changes Oct 8, 2025

View reviewed changes

v0i0 approved these changes Oct 8, 2025

View reviewed changes

shunting314 added the topic: not user facing topic category label Oct 8, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 9, 2025

pytorchmergebot added the merging label Oct 9, 2025

pytorchmergebot removed the merging label Oct 9, 2025

pytorchmergebot closed this in a3c7006 Oct 10, 2025

pytorchmergebot added the Merged label Oct 10, 2025

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Oct 10, 2025

pytorchmergebot reopened this Oct 10, 2025

shunting314 added the ciflow/vllm label Oct 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] verify determinism with inductor benchmark script #164904

[inductor] verify determinism with inductor benchmark script #164904

shunting314 commented Oct 8, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

v0i0 Oct 8, 2025

Uh oh!

shunting314 Oct 8, 2025

Uh oh!

Uh oh!

pytorchmergebot commented Oct 8, 2025

Uh oh!

shunting314 commented Oct 9, 2025

Uh oh!

pytorchmergebot commented Oct 9, 2025

Uh oh!

pytorchmergebot commented Oct 9, 2025

Uh oh!

pytorchmergebot commented Oct 9, 2025

Uh oh!

huydhn commented Oct 10, 2025

Uh oh!

pytorchmergebot commented Oct 10, 2025

Uh oh!

pytorchmergebot commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[inductor] verify determinism with inductor benchmark script #164904

Are you sure you want to change the base?

[inductor] verify determinism with inductor benchmark script #164904

Conversation

shunting314 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164904

❗ 1 Active SEVs

❌ 1 New Failure, 1 Pending

Uh oh!

v0i0 Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

shunting314 Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pytorchmergebot commented Oct 8, 2025

Uh oh!

shunting314 commented Oct 9, 2025

Uh oh!

pytorchmergebot commented Oct 9, 2025

Merge started

Uh oh!

pytorchmergebot commented Oct 9, 2025

Merge failed

Uh oh!

pytorchmergebot commented Oct 9, 2025

Uh oh!

huydhn commented Oct 10, 2025

Uh oh!

pytorchmergebot commented Oct 10, 2025

Uh oh!

pytorchmergebot commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

shunting314 commented Oct 8, 2025 •

edited

Loading

pytorch-bot bot commented Oct 8, 2025 •

edited

Loading