sampled_addmm: backward performance improvements #103544

nikitaved · 2023-06-13T20:45:54Z

No need to do double sparse_mask, let's squash everything into one call!
This PR exercises #103750, so here is an autogened code for the backward pass.

at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) {
  auto& self_ = unpack(self, "self", 0);
  auto& mat1_ = unpack(mat1, "mat1", 1);
  auto& mat2_ = unpack(mat2, "mat2", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 );

  std::shared_ptr<SparseSampledAddmmBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 ));
    grad_fn->alpha = alpha;
    grad_fn->beta = beta;
    if (grad_fn->should_compute_output(2)) {
      grad_fn->mat1_ = SavedVariable(mat1, false);
    }
    if (grad_fn->should_compute_output(1)) {
      grad_fn->mat2_ = SavedVariable(mat2, false);
    }
    grad_fn->self_ = SavedVariable(self, false);
  }

As you can see, we do not save tensors unless needed.

Stack from ghstack (oldest at bottom):

cc @alexsamardzic @pearu @cpuhrsch @amjames @bhosmer @ezyang @albanD @zou3519 @gqchen @soulitzer @lezcano @Varal7

[ghstack-poisoned]

pytorch-bot · 2023-06-13T20:45:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103544

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 21b7dc1:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved · 2023-06-13T21:12:23Z

torch/csrc/autograd/FunctionsManual.cpp

+  const auto grad_projected = grad.sparse_mask(self);
+  const auto self_requires_grad = grad_input_mask[0];
+  const auto mat1_requires_grad = grad_input_mask[1];
+  const auto mat2_requires_grad = grad_input_mask[2];
+  return std::make_tuple(


@albanD , at this point it is for sure that at least one grad_input_mask is true, right?

yes, it should be safe to assume that

albanD

I'm not sure we want to do that because this will regress memory usage :/
In particular, if only self requires grad, m1 and m2 won't be saved today. But with this new code, they will always be saved.

I think there was a discussion on fixing this, do you remember where it is @soulitzer ?
Maybe something like sparse_sampled_addmm_backward(grad, self, optional_save_if(mat2.requires_grad(), mat1), optional_save_if(mat1.requires_grad(), mat2), alpha, beta, grad_input_mask) and make the backward take optional<> args.
We then need to use our pattern matcher for replacement to make a smart decision on saving.

nikitaved · 2023-06-13T21:25:02Z

tools/autograd/derivatives.yaml

-  self: maybe_multiply(grad, beta.conj())
-  mat1: maybe_multiply(grad.sparse_mask(self).mm(mat2.mH()), alpha.conj())
-  mat2: maybe_multiply(mat1.mH().mm(grad.sparse_mask(self)), alpha.conj())
+  self, mat1, mat2: sparse_sampled_addmm_backward(grad, self, mat1, mat2, alpha, beta, grad_input_mask)


To avoid any confusion, the gradient wrt self is still incorrect. It is fixed in #103548.

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

soulitzer · 2023-06-14T17:55:40Z

Here's the issue about multi-output functions saving unnecessary tensors #97575

nikitaved · 2023-06-14T18:07:05Z

@albanD , @soulitzer , would the current code be sufficient for now? Or is there a way to tell autogen to produce and reuse some intermediaries?

soulitzer · 2023-06-14T19:45:30Z

There's no way to tell the codegen to do this currently unfortunately

There needs to be an API to specify what needs to be saved under what conditions, and then some code gen updates to translate that information into extra logic in the VariableType kernel

And what alban is proposal here could be promising:

Maybe something like sparse_sampled_addmm_backward(grad, self, optional_save_if(mat2.requires_grad(), mat1), optional_save_if(mat1.requires_grad(), mat2), alpha, beta, grad_input_mask) and make the backward take optional<> args.
We then need to use our pattern matcher for replacement to make a smart decision on saving.

nikitaved · 2023-06-14T19:58:57Z

What @albanD suggested is already applied here :)

soulitzer · 2023-06-14T20:18:36Z

So if we land this PR as-is we're just trading off memory for compute. This trade off could be worth it though if we could say that in practice all the inputs tend to have requires grad.

soulitzer · 2023-06-14T20:26:01Z

Oh I believe that alban is suggesting is something slightly different. The grad_input_mask is a quantity computed when backward is run, so that would not influence how things are saved during forward.

nikitaved · 2023-06-14T21:09:57Z

Ah, I see, I was not aware of that, will change than.

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

soulitzer · 2023-06-23T14:51:12Z

Good catch!

No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved · 2023-06-28T08:44:03Z

@pytorchbot merge

pytorchmergebot · 2023-06-28T08:46:09Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

nikitaved · 2023-06-28T08:47:33Z

@pytorchbot merge

pytorchmergebot · 2023-06-28T08:49:48Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

sampled_addmm: backward performance improvements

7efcfe1

[ghstack-poisoned]

nikitaved requested review from albanD and soulitzer as code owners June 13, 2023 20:45

This was referenced Jun 13, 2023

softmax: Triton kernel for BSR inputs #102095

Closed

sampled_addmm for BSR inputs: fuse softmax #101978

Closed

nikitaved added module: sparse Related to torch.sparse module: autograd Related to torch.autograd, and the autograd engine in general labels Jun 13, 2023

pytorchbot added the open source label Jun 13, 2023

Update on "sampled_addmm: backward performance improvements"

0cc1103

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved commented Jun 13, 2023

View reviewed changes

nikitaved mentioned this pull request Jun 13, 2023

sampled_addmm backward: fix incorrect gradient wrt self #103548

Closed

nikitaved added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 13, 2023

albanD reviewed Jun 13, 2023

View reviewed changes

nikitaved commented Jun 13, 2023

View reviewed changes

Update on "sampled_addmm: backward performance improvements"

5d9f390

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved requested a review from albanD June 13, 2023 22:01

Update on "sampled_addmm: backward performance improvements"

2e8bc8f

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sampled_addmm: backward performance improvements"

f8cfec3

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved mentioned this pull request Jun 16, 2023

Multioutput backward formula: allow conditional guards against saving #103750

Closed

nikitaved added 3 commits June 16, 2023 14:15

Update on "sampled_addmm: backward performance improvements"

2df214d

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sampled_addmm: backward performance improvements"

1e9777f

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sampled_addmm: backward performance improvements"

5e4e077

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sampled_addmm: backward performance improvements"

8436a1b

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved mentioned this pull request Jun 19, 2023

doc fix for scaled_dot_product_attention #103835

Closed

nikitaved added 2 commits June 20, 2023 17:07

Update on "sampled_addmm: backward performance improvements"

2e35023

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sampled_addmm: backward performance improvements"

6740a7f

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved mentioned this pull request Jun 22, 2023

SDPA: frontend for BSR masks #104042

Closed

Update on "sampled_addmm: backward performance improvements"

b84b6ea

No need to do double `sparse_mask`, let's squash everything into one call! cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

soulitzer approved these changes Jun 23, 2023

View reviewed changes

nikitaved added 5 commits June 23, 2023 15:16

pytorchmergebot added the merging label Jun 28, 2023

pytorchmergebot removed the merging label Jun 28, 2023

nikitaved added the release notes: sparse release notes category label Jun 28, 2023

pytorchmergebot added the merging label Jun 28, 2023

pytorchmergebot added Merged and removed merging labels Jun 28, 2023

pytorchmergebot closed this in 5cf3a99 Jun 28, 2023

facebook-github-bot deleted the gh/nikitaved/54/head branch July 1, 2023 14:17

sampled_addmm: backward performance improvements #103544

sampled_addmm: backward performance improvements #103544

Uh oh!

Conversation

nikitaved commented Jun 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103544

✅ No Failures

Uh oh!

nikitaved Jun 13, 2023

Choose a reason for hiding this comment

Uh oh!

soulitzer Jun 20, 2023

Choose a reason for hiding this comment

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

nikitaved Jun 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer commented Jun 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikitaved commented Jun 14, 2023

Uh oh!

soulitzer commented Jun 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikitaved commented Jun 14, 2023

Uh oh!

soulitzer commented Jun 14, 2023

Uh oh!

soulitzer commented Jun 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikitaved commented Jun 14, 2023

Uh oh!

soulitzer commented Jun 23, 2023

Uh oh!

nikitaved commented Jun 28, 2023

Uh oh!

pytorchmergebot commented Jun 28, 2023

Merge failed

Uh oh!

nikitaved commented Jun 28, 2023

Uh oh!

pytorchmergebot commented Jun 28, 2023

Merge started

Uh oh!

Uh oh!

nikitaved commented Jun 13, 2023 •

edited

Loading

pytorch-bot bot commented Jun 13, 2023 •

edited

Loading

nikitaved Jun 13, 2023 •

edited

Loading

soulitzer commented Jun 14, 2023 •

edited

Loading

soulitzer commented Jun 14, 2023 •

edited

Loading

soulitzer commented Jun 14, 2023 •

edited

Loading