Skip to content

Conversation

lucylq
Copy link
Contributor

@lucylq lucylq commented Jul 6, 2023

Summary:
Currently, broadcast is supported for 4D tensors where, if the batch or channel dimensions are not equal, then the batch and channel of one tensor must both be 1, ie:

tensorA NCHW:
5, 2, 3, 3
tensorB NCHW:
1, 1, 3, 3 --> batch=1, channel=1

This diff adds broadcast support for 4D tensors where the batch and channel of a tensor are different, ie:

tensorA NCHW:
5, 1, 3, 3
tensorB NCHW:
1, 5, 3, 3

Broadcast rules:

- tensorA.dim()[x] = tensorB.dim()[x]
- tensorA.dim()[x] == 1 || tensorB.dim()[x] == 1
- tensorA.dim()[x] does not exist || tensorB.dim()[x] does not exist

Broadcast method:

  1. Pass output, input and other tensors to the shader
  2. Iterate through the output texture to calculate the value of each texel (no repeating)
  3. Mapping NHW positions: use modulo
  4. Mapping C position: divide pos.z by ceil(C/4) to map to original tensor range

Also some test refactoring to reduce repeated setup code.

Test Plan:
New tests:

Add

[ RUN      ] VulkanAPITest.add_broadcast5
[       OK ] VulkanAPITest.add_broadcast5 (0 ms)
[ RUN      ] VulkanAPITest.add_broadcast6
[       OK ] VulkanAPITest.add_broadcast6 (0 ms)

Sub

[ RUN      ] VulkanAPITest.sub_broadcast5
[       OK ] VulkanAPITest.sub_broadcast5 (0 ms)
[ RUN      ] VulkanAPITest.sub_broadcast6
[       OK ] VulkanAPITest.sub_broadcast6 (0 ms)

Mul

[ RUN      ] VulkanAPITest.mul_broadcast5
[       OK ] VulkanAPITest.mul_broadcast5 (1 ms)
[ RUN      ] VulkanAPITest.mul_broadcast6
[       OK ] VulkanAPITest.mul_broadcast6 (1 ms)

Div

[ RUN      ] VulkanAPITest.div_broadcast5
[       OK ] VulkanAPITest.div_broadcast5 (1 ms)
[ RUN      ] VulkanAPITest.div_broadcast6
[       OK ] VulkanAPITest.div_broadcast6 (2 ms)

All tests:
https://www.internalfb.com/phabricator/paste/view/P781794761

Run clang-format on glsl files and Arithmetic.cpp

Differential Revision: D46874508

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @anijain2305

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 6, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104718

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 05f3818:

NEW FAILURE - The following job has failed:

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: vulkan release notes category label Jul 6, 2023
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

@lucylq lucylq force-pushed the export-D46874508 branch from 3e6e719 to 1d61cf0 Compare July 11, 2023 17:55
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

@lucylq lucylq force-pushed the export-D46874508 branch from 1d61cf0 to 9b01b2b Compare July 12, 2023 18:36
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

@lucylq lucylq force-pushed the export-D46874508 branch from 9b01b2b to b8ce8e5 Compare July 12, 2023 18:42
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

@lucylq lucylq force-pushed the export-D46874508 branch from b8ce8e5 to 78e1abb Compare July 12, 2023 18:51
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

@lucylq lucylq force-pushed the export-D46874508 branch from 78e1abb to bb174f3 Compare July 13, 2023 23:03
@lucylq lucylq force-pushed the export-D46874508 branch from bb174f3 to 84353d2 Compare July 13, 2023 23:25
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

@lucylq lucylq force-pushed the export-D46874508 branch from 84353d2 to 9acd84d Compare July 14, 2023 19:01
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

@lucylq lucylq force-pushed the export-D46874508 branch from 9acd84d to ecdaa1e Compare July 14, 2023 19:11
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

@SS-JIA SS-JIA self-requested a review July 14, 2023 22:00
@lucylq lucylq force-pushed the export-D46874508 branch from 19a2f88 to 4412a95 Compare July 14, 2023 22:29
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) labels Jul 17, 2023
@lucylq lucylq force-pushed the export-D46874508 branch from a67f7d9 to 2e3e2f1 Compare July 17, 2023 18:55
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

…ize arithmetic operators (pytorch#104718)

Summary:
Pull Request resolved: pytorch#104718

## This diff
1. Templatizes the arithmetic operators
2. Adds general broadcasting to arithmetic operators

Follow on diff:
* Templatize the remaining arithmetic operators (scalar, in-place)
* Rename Arithmetic.cpp --> BinaryOps.cpp

## Templatizing arithmetic ops
Create template so that `add`, `sub`, `mul`, `div` can be generated from one shader. See Stephen's comment on v2. Note that there is a special case for div, where we account for the divide by 0.

## Adding general broadcasting to arithmetic ops
Currently, broadcast is supported for 4D tensors where, if the batch or channel dimensions are not equal, then the batch and channel of one tensor must both be 1, ie:
```
tensorA NCHW:
5, 2, 3, 3
tensorB NCHW:
1, 1, 3, 3 --> batch=1, channel=1
```
This diff adds broadcast support for 4D tensors where the batch and channel of a tensor are different, ie:
```
tensorA NCHW:
5, 1, 3, 3
tensorB NCHW:
1, 5, 3, 3
```

Broadcast rules:
```
- tensorA.dim()[x] = tensorB.dim()[x]
- tensorA.dim()[x] == 1 || tensorB.dim()[x] == 1
- tensorA.dim()[x] does not exist || tensorB.dim()[x] does not exist
```

Broadcast method:

1. Pass `output`, `input` and `other` tensors to the shader
2. Iterate through the output texture to calculate the value of each texel (no repeating)
3. Mapping NHW positions: use modulo
4. Mapping C position: divide pos.z by ceil(C/4) to map to original tensor range

 ---
Also some test refactoring to reduce repeated setup code.

Test Plan:
New tests:

Add
```
[ RUN      ] VulkanAPITest.add_broadcast5
[       OK ] VulkanAPITest.add_broadcast5 (0 ms)
[ RUN      ] VulkanAPITest.add_broadcast6
[       OK ] VulkanAPITest.add_broadcast6 (0 ms)
```

Sub
```
[ RUN      ] VulkanAPITest.sub_broadcast5
[       OK ] VulkanAPITest.sub_broadcast5 (0 ms)
[ RUN      ] VulkanAPITest.sub_broadcast6
[       OK ] VulkanAPITest.sub_broadcast6 (0 ms)
```

Mul
```
[ RUN      ] VulkanAPITest.mul_broadcast5
[       OK ] VulkanAPITest.mul_broadcast5 (1 ms)
[ RUN      ] VulkanAPITest.mul_broadcast6
[       OK ] VulkanAPITest.mul_broadcast6 (1 ms)
```

Div
```
[ RUN      ] VulkanAPITest.div_broadcast5
[       OK ] VulkanAPITest.div_broadcast5 (1 ms)
[ RUN      ] VulkanAPITest.div_broadcast6
[       OK ] VulkanAPITest.div_broadcast6 (2 ms)
```

All tests:
https://www.internalfb.com/phabricator/paste/view/P781794761

```
xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:6377: Skipped
QueryPool is not available
[  SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms)
[----------] 307 tests from VulkanAPITest (5576 ms total)

[----------] Global test environment tear-down
[==========] 307 tests from 1 test suite ran. (5576 ms total)
[  PASSED  ] 306 tests.
[  SKIPPED ] 1 test, listed below:
[  SKIPPED ] VulkanAPITest.querypool_flushed_shader_log

  YOU HAVE 5 DISABLED TESTS
```

Test Vulkan Delegate on OD:
```
buck2 test 'fbcode//mode/dev' fbcode//executorch/backends/vulkan/test:test_vulkan_delegate -- --exact 'executorch/backends/vulkan/test:test_vulkan_delegate - test_vulkan_backend_add (executorch.backends.vulkan.test.test_vulkan_delegate.TestBackends)'

Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0
```

Run clang-format on glsl files and Arithmetic.cpp

Reviewed By: SS-JIA

Differential Revision: D46874508

fbshipit-source-id: e2c0f4c4525c5d567c75c2e0a50065e00de24066
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D46874508

@lucylq lucylq force-pushed the export-D46874508 branch from 2e3e2f1 to 05f3818 Compare July 17, 2023 19:03
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 18, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo module: inductor release notes: quantization release notes category release notes: vulkan release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants