-
Notifications
You must be signed in to change notification settings - Fork 25.3k
add channel last 3d support for maxpool3d on CPU #97775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97775
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Unrelated FailureAs of commit b62a746 with merge base c68d0a7 ( NEW FAILURES - The following jobs have failed:
UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
@mikaylagawarecki Could you please review this PR ? Thank you. |
@CaoE Sure, will take a look later today! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @CaoE as a broader comment
Following up from our previous discussion, is #106104 ready for review? Happy to review it if it is (and would prefer that we use that testing before landing the rest of these PRs :)
Separately, we must be wary that the test_memory_format
tests for ModuleInfos has been skipped rather than xfailed for quite a few ops, as we add channels_last_3d support for more of them, could we unskip these tests as well please e.g. https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_modules.py#L2555
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
@mikaylagawarecki Could you please review this PR ? #106104 is ready for review. Thank you. |
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
@mikaylagawarecki Can we merge this PR before 2.1 branch cut if possible ? |
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for using the ModuleInfo testing
@pytorchbot merge -f "macos failures are unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
### Testing Single socket (28 cores): shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 3.959584 | 5.493402 | 0.557232 | 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 0.815511 | 1.351261 | 5.710506 | 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 10.63426 | 15.28637 | 2.67656 | 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 2.63570 | 2.05532 | 2.55452 | 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 0.375469 | 0.479748 | 0.066364 | 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 0.112197 | 0.112326 | 0.111697 | 0.145364 Single core: shape | fp32 forward / ms | bf16 forward / ms | fp32 backward / ms | bf16 backward / ms -- | -- | -- | -- | -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig | 92.16582 | 128.6513 | 6.684325 | 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL | 10.14318 | 29.80297 | 7.350142 | 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig | 238.55453 | 331.89967 | 19.694657 | 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL | 30.17079 | 32.75628 | 22.44543 | 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig | 7.474389 | 9.937217 | 0.236015 | 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d | 2.318954 | 2.469444 | 0.262125 | 0.401361 Pull Request resolved: #97775 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
Stack from ghstack (oldest at bottom):
Testing
Single socket (28 cores):
Single core:
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10