Multioutput backward formula: allow conditional guards against saving #103750

nikitaved · 2023-06-16T14:01:22Z

Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro wrap_opt_if which could be used to hint autogen about variable interdependence.

For example, the following code is being generated for _trilinear with this modification:

at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) {
  auto& i1_ = unpack(i1, "i1", 0);
  auto& i2_ = unpack(i2, "i2", 1);
  auto& i3_ = unpack(i3, "i3", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 );

  [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3));
  std::shared_ptr<TrilinearBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 ));
    grad_fn->expand1 = expand1.vec();
    grad_fn->expand2 = expand2.vec();
    grad_fn->expand3 = expand3.vec();
    if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) {
      grad_fn->i1_ = SavedVariable(i1, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) {
      grad_fn->i2_ = SavedVariable(i2, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) {
      grad_fn->i3_ = SavedVariable(i3, false);
    }
    grad_fn->sumdim = sumdim.vec();
  }

with the following backward modifications:

 - name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor
  - i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, grad_input_mask)
  + i1, i2, i3: "_trilinear_backward(grad,
  +             wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]),
  +             wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]),
  +             wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]),
  +             expand1, expand2, expand3, sumdim, grad_input_mask)"

Stack from ghstack (oldest at bottom):

cc @ezyang @albanD @zou3519 @gqchen @pearu @soulitzer @lezcano @Varal7 @bhosmer @bdhirsh

variables. [ghstack-poisoned]

pytorch-bot · 2023-06-16T14:01:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103750

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b76601a:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

nikitaved · 2023-06-16T14:10:57Z

@albanD , @soulitzer , could you please have a look so that to potentially improve on top of it if the idea is more or less reasonable?

…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]

albanD

I'll let @soulitzer review this one.
What I had in mind was more of adding a replacement entry here:

pytorch/tools/autograd/load_derivatives.py

Line 758 in ae78e80

REPLACEMENTS: List[Tuple[str, Dict[str, Any]]] = [

but maybe that doesn't work?

nikitaved · 2023-06-16T14:24:50Z

@albanD , nope, that did not work... The substituted formula is then being parsed by guard_for which saves everything for multi-output formulas by default.

nikitaved · 2023-06-16T14:27:27Z

torch/csrc/autograd/FunctionsManual.h

+inline c10::optional<Tensor> wrap_opt_if(const Tensor& t, const bool cond) {
+  using OptTensor = c10::optional<Tensor>;
+  return cond ? OptTensor(t) : static_cast<OptTensor>(c10::nullopt);
+}
+


For the context: this one is used in the next PR up in the stack for sparse_sampled_addmm_backward.

If codegen did its job correctly, we would always get an undefined tensor t here right? Maybe we can assert for that here.

No, not really, unfortunately. This code is being run at backward compute. But we can assert inside backward implementations for sure to test both conditions and savings.

oh whoops, good point.

nikitaved · 2023-06-16T14:36:01Z

The tests are missing though, I will figure them out later.

albanD · 2023-06-16T18:25:57Z

nope, that did not work... The substituted formula is then being parsed by guard_for which saves everything for multi-output formulas by default.

Ho sad. The new formula should just contain arg_opt right? Then the expr provided will be used to generate what is given to the SavedVariable.
Similar to how self->sym_sizes() is replaced by self_sym_sizes_opt in the formula and the saving formula is grad_fn->bias_sym_sizes_opt = bias.has_value() ? c10::optional<c10::SymIntArrayRef>(bias->sym_sizes()) : c10::nullopt; where the decision is made during the fw to save something or not.

nikitaved · 2023-06-16T18:55:04Z

Tried, did not work as well :( Hence had to dig deeper. But let me try again, I might have done things wrong...

…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved · 2023-06-17T10:18:46Z

@albanD , right, but expressions needed use several variables, so I need to match and back-substitute into the expression the saving condition. Back-substitution seems possible only in res, but this is not what we need, right?

…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]

soulitzer · 2023-06-20T03:32:47Z

Looks pretty good! For testing, maybe just apply this to the multi-output formulas we already have?
Could you also add the codegen output before/after to the PR description.

albanD · 2023-06-20T13:57:56Z

Sounds good! Thanks for looking into it in details.

…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]

…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. For example, the following code is being generated for `_trilinear` with this modification: ``` at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) { auto& i1_ = unpack(i1, "i1", 0); auto& i2_ = unpack(i2, "i2", 1); auto& i3_ = unpack(i3, "i3", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 ); [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3)); std::shared_ptr<TrilinearBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 )); grad_fn->expand1 = expand1.vec(); grad_fn->expand2 = expand2.vec(); grad_fn->expand3 = expand3.vec(); if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) { grad_fn->i1_ = SavedVariable(i1, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) { grad_fn->i2_ = SavedVariable(i2, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) { grad_fn->i3_ = SavedVariable(i3, false); } grad_fn->sumdim = sumdim.vec(); } ``` cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]

…ements" No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. For example, the following code is being generated for `_trilinear` with this modification: ``` at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) { auto& i1_ = unpack(i1, "i1", 0); auto& i2_ = unpack(i2, "i2", 1); auto& i3_ = unpack(i3, "i3", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 ); [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3)); std::shared_ptr<TrilinearBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 )); grad_fn->expand1 = expand1.vec(); grad_fn->expand2 = expand2.vec(); grad_fn->expand3 = expand3.vec(); if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) { grad_fn->i1_ = SavedVariable(i1, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) { grad_fn->i2_ = SavedVariable(i2, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) { grad_fn->i3_ = SavedVariable(i3, false); } grad_fn->sumdim = sumdim.vec(); } ``` with the following backward modifications: ``` - name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor - i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, grad_input_mask) + i1, i2, i3: "_trilinear_backward(grad, + wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]), + wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]), + wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]), + expand1, expand2, expand3, sumdim, grad_input_mask)" ``` cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]

…ements" No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved · 2023-06-26T15:46:07Z

@soulitzer , could you please have a look at the comment?

soulitzer

comment looks great, small nit

soulitzer · 2023-06-26T16:01:01Z

tools/autograd/derivatives.yaml

@@ -110,6 +110,25 @@
 #     destroy saved buffers if we know variables are not going to be retained,
 #     e.g., it is used by _cudnn_rnn
 #
+# In a gradient expression, the following functions are in scope:


we don't need this additional header here right? it kind of sounds like this is the only function in scope in the gradient expression.

This is true, but the section above is about variables. I could move it there while replacing variables with variables/functions. What do you think?

Ahh mb didn't realize the difference, combining sounds good to me!

…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. For example, the following code is being generated for `_trilinear` with this modification: ``` at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) { auto& i1_ = unpack(i1, "i1", 0); auto& i2_ = unpack(i2, "i2", 1); auto& i3_ = unpack(i3, "i3", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 ); [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3)); std::shared_ptr<TrilinearBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 )); grad_fn->expand1 = expand1.vec(); grad_fn->expand2 = expand2.vec(); grad_fn->expand3 = expand3.vec(); if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) { grad_fn->i1_ = SavedVariable(i1, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) { grad_fn->i2_ = SavedVariable(i2, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) { grad_fn->i3_ = SavedVariable(i3, false); } grad_fn->sumdim = sumdim.vec(); } ``` with the following backward modifications: ``` - name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor - i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, grad_input_mask) + i1, i2, i3: "_trilinear_backward(grad, + wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]), + wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]), + wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]), + expand1, expand2, expand3, sumdim, grad_input_mask)" ``` cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]

…ements" No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved · 2023-06-27T11:10:48Z

@soulitzer , the nit is addressed. If there is anything else, please, let me know. Otherwise it is ready to go :)

…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. For example, the following code is being generated for `_trilinear` with this modification: ``` at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) { auto& i1_ = unpack(i1, "i1", 0); auto& i2_ = unpack(i2, "i2", 1); auto& i3_ = unpack(i3, "i3", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 ); [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3)); std::shared_ptr<TrilinearBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 )); grad_fn->expand1 = expand1.vec(); grad_fn->expand2 = expand2.vec(); grad_fn->expand3 = expand3.vec(); if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) { grad_fn->i1_ = SavedVariable(i1, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) { grad_fn->i2_ = SavedVariable(i2, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) { grad_fn->i3_ = SavedVariable(i3, false); } grad_fn->sumdim = sumdim.vec(); } ``` with the following backward modifications: ``` - name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor - i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, grad_input_mask) + i1, i2, i3: "_trilinear_backward(grad, + wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]), + wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]), + wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]), + expand1, expand2, expand3, sumdim, grad_input_mask)" ``` cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]

…ements" No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved · 2023-06-27T14:37:13Z

@pytorchbot merge

pytorchmergebot · 2023-06-27T14:39:17Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

nikitaved · 2023-06-27T14:39:55Z

@pytorchbot merge

pytorchmergebot · 2023-06-27T14:41:39Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. Pull Request resolved: #103544 Approved by: https://github.com/soulitzer

Multioutput backward formula: allow conditional guards against saving

dea1470

variables. [ghstack-poisoned]

nikitaved requested review from albanD and soulitzer as code owners June 16, 2023 14:01

This was referenced Jun 16, 2023

sampled_addmm: backward performance improvements #103544

Closed

sampled_addmm backward: fix incorrect gradient wrt self #103548

Closed

softmax: Triton kernel for BSR inputs #102095

Closed

sampled_addmm for BSR inputs: fuse softmax #101978

Closed

pytorchbot added the open source label Jun 16, 2023

nikitaved added the module: autograd Related to torch.autograd, and the autograd engine in general label Jun 16, 2023

nikitaved added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 16, 2023

albanD reviewed Jun 16, 2023

View reviewed changes

nikitaved commented Jun 16, 2023

View reviewed changes

nikitaved added 2 commits June 19, 2023 03:51

nikitaved mentioned this pull request Jun 19, 2023

doc fix for scaled_dot_product_attention #103835

Closed

nikitaved added 2 commits June 20, 2023 17:07

nikitaved mentioned this pull request Jun 22, 2023

SDPA: frontend for BSR masks #104042

Closed

soulitzer reviewed Jun 26, 2023

View reviewed changes

soulitzer approved these changes Jun 27, 2023

View reviewed changes

pytorchmergebot added the merging label Jun 27, 2023

pytorchmergebot removed the merging label Jun 27, 2023

nikitaved added module: codegen Issues related to the codegen for Aten and Autograd topic: not user facing topic category labels Jun 27, 2023

pytorchmergebot added the merging label Jun 27, 2023

pytorchmergebot added Merged and removed merging labels Jun 27, 2023

pytorchmergebot closed this in 567b5e5 Jun 27, 2023

facebook-github-bot deleted the gh/nikitaved/56/head branch July 1, 2023 14:17

Multioutput backward formula: allow conditional guards against saving #103750

Multioutput backward formula: allow conditional guards against saving #103750

Uh oh!

Conversation

nikitaved commented Jun 16, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103750

✅ No Failures

Uh oh!

nikitaved commented Jun 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

nikitaved commented Jun 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikitaved Jun 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer Jun 20, 2023

Choose a reason for hiding this comment

Uh oh!

nikitaved Jun 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer Jun 20, 2023

Choose a reason for hiding this comment

Uh oh!

nikitaved commented Jun 16, 2023

Uh oh!

albanD commented Jun 16, 2023

Uh oh!

nikitaved commented Jun 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikitaved commented Jun 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

soulitzer commented Jun 20, 2023

Uh oh!

albanD commented Jun 20, 2023

Uh oh!

nikitaved commented Jun 26, 2023

Uh oh!

soulitzer left a comment

Choose a reason for hiding this comment

Uh oh!

soulitzer Jun 26, 2023

Choose a reason for hiding this comment

Uh oh!

nikitaved Jun 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer Jun 26, 2023

Choose a reason for hiding this comment

Uh oh!

nikitaved commented Jun 27, 2023

Uh oh!

nikitaved commented Jun 27, 2023

Uh oh!

pytorchmergebot commented Jun 27, 2023

Merge failed

Uh oh!

nikitaved commented Jun 27, 2023

Uh oh!

pytorchmergebot commented Jun 27, 2023

Merge started

Uh oh!

Uh oh!

nikitaved commented Jun 16, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 16, 2023 •

edited

Loading

nikitaved commented Jun 16, 2023 •

edited

Loading

nikitaved commented Jun 16, 2023 •

edited

Loading

nikitaved Jun 16, 2023 •

edited

Loading

nikitaved Jun 20, 2023 •

edited

Loading

nikitaved commented Jun 16, 2023 •

edited

Loading

nikitaved commented Jun 17, 2023 •

edited

Loading

nikitaved Jun 26, 2023 •

edited

Loading