Skip to content

Conversation

nikitaved
Copy link
Collaborator

@nikitaved nikitaved commented Jun 16, 2023

Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro wrap_opt_if which could be used to hint autogen about variable interdependence.

For example, the following code is being generated for _trilinear with this modification:

at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) {
  auto& i1_ = unpack(i1, "i1", 0);
  auto& i2_ = unpack(i2, "i2", 1);
  auto& i3_ = unpack(i3, "i3", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 );

  [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3));
  std::shared_ptr<TrilinearBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 ));
    grad_fn->expand1 = expand1.vec();
    grad_fn->expand2 = expand2.vec();
    grad_fn->expand3 = expand3.vec();
    if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) {
      grad_fn->i1_ = SavedVariable(i1, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) {
      grad_fn->i2_ = SavedVariable(i2, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) {
      grad_fn->i3_ = SavedVariable(i3, false);
    }
    grad_fn->sumdim = sumdim.vec();
  }

with the following backward modifications:

 - name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor
  - i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, grad_input_mask)
  + i1, i2, i3: "_trilinear_backward(grad,
  +             wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]),
  +             wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]),
  +             wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]),
  +             expand1, expand2, expand3, sumdim, grad_input_mask)"

Stack from ghstack (oldest at bottom):

cc @ezyang @albanD @zou3519 @gqchen @pearu @soulitzer @lezcano @Varal7 @bhosmer @bdhirsh

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 16, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103750

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b76601a:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@nikitaved
Copy link
Collaborator Author

nikitaved commented Jun 16, 2023

@albanD , @soulitzer , could you please have a look so that to potentially improve on top of it if the idea is more or less reasonable?

@nikitaved nikitaved added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 16, 2023
…inst saving"

Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence.



cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7

[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @soulitzer review this one.
What I had in mind was more of adding a replacement entry here:

REPLACEMENTS: List[Tuple[str, Dict[str, Any]]] = [
but maybe that doesn't work?

@nikitaved
Copy link
Collaborator Author

nikitaved commented Jun 16, 2023

@albanD , nope, that did not work... The substituted formula is then being parsed by guard_for which saves everything for multi-output formulas by default.

Comment on lines +41 to +45
inline c10::optional<Tensor> wrap_opt_if(const Tensor& t, const bool cond) {
using OptTensor = c10::optional<Tensor>;
return cond ? OptTensor(t) : static_cast<OptTensor>(c10::nullopt);
}

Copy link
Collaborator Author

@nikitaved nikitaved Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the context: this one is used in the next PR up in the stack for sparse_sampled_addmm_backward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If codegen did its job correctly, we would always get an undefined tensor t here right? Maybe we can assert for that here.

Copy link
Collaborator Author

@nikitaved nikitaved Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not really, unfortunately. This code is being run at backward compute. But we can assert inside backward implementations for sure to test both conditions and savings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh whoops, good point.

@nikitaved
Copy link
Collaborator Author

The tests are missing though, I will figure them out later.

@albanD
Copy link
Collaborator

albanD commented Jun 16, 2023

nope, that did not work... The substituted formula is then being parsed by guard_for which saves everything for multi-output formulas by default.

Ho sad. The new formula should just contain arg_opt right? Then the expr provided will be used to generate what is given to the SavedVariable.
Similar to how self->sym_sizes() is replaced by self_sym_sizes_opt in the formula and the saving formula is grad_fn->bias_sym_sizes_opt = bias.has_value() ? c10::optional<c10::SymIntArrayRef>(bias->sym_sizes()) : c10::nullopt; where the decision is made during the fw to save something or not.

@nikitaved
Copy link
Collaborator Author

nikitaved commented Jun 16, 2023

Tried, did not work as well :( Hence had to dig deeper. But let me try again, I might have done things wrong...

…inst saving"

Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence.



cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7

[ghstack-poisoned]
@nikitaved
Copy link
Collaborator Author

nikitaved commented Jun 17, 2023

@albanD , right, but expressions needed use several variables, so I need to match and back-substitute into the expression the saving condition. Back-substitution seems possible only in res, but this is not what we need, right?

…inst saving"

Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence.



cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7

[ghstack-poisoned]
…inst saving"

Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence.



cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7

[ghstack-poisoned]
@soulitzer
Copy link
Contributor

Looks pretty good! For testing, maybe just apply this to the multi-output formulas we already have?
Could you also add the codegen output before/after to the PR description.

@albanD
Copy link
Collaborator

albanD commented Jun 20, 2023

Sounds good! Thanks for looking into it in details.

…inst saving"

Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence.



cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7

[ghstack-poisoned]
…inst saving"

Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence.



cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7

[ghstack-poisoned]
…inst saving"

Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence.

For example, the following code is being generated for `_trilinear` with this modification:
```
at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) {
  auto& i1_ = unpack(i1, "i1", 0);
  auto& i2_ = unpack(i2, "i2", 1);
  auto& i3_ = unpack(i3, "i3", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 );

  [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3));
  std::shared_ptr<TrilinearBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 ));
    grad_fn->expand1 = expand1.vec();
    grad_fn->expand2 = expand2.vec();
    grad_fn->expand3 = expand3.vec();
    if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) {
      grad_fn->i1_ = SavedVariable(i1, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) {
      grad_fn->i2_ = SavedVariable(i2, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) {
      grad_fn->i3_ = SavedVariable(i3, false);
    }
    grad_fn->sumdim = sumdim.vec();
  }

```



cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7

[ghstack-poisoned]
nikitaved added a commit that referenced this pull request Jun 26, 2023
…ements"

No need to do double `sparse_mask`, let's squash everything into one call!
This PR exercises #103750, so here is an autogened code for the backward pass.

```
at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) {
  auto& self_ = unpack(self, "self", 0);
  auto& mat1_ = unpack(mat1, "mat1", 1);
  auto& mat2_ = unpack(mat2, "mat2", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 );

  std::shared_ptr<SparseSampledAddmmBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 ));
    grad_fn->alpha = alpha;
    grad_fn->beta = beta;
    if (grad_fn->should_compute_output(2)) {
      grad_fn->mat1_ = SavedVariable(mat1, false);
    }
    if (grad_fn->should_compute_output(1)) {
      grad_fn->mat2_ = SavedVariable(mat2, false);
    }
    grad_fn->self_ = SavedVariable(self, false);
  }

```

As you can see, we do not save tensors unless needed.




cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7

[ghstack-poisoned]
nikitaved added a commit that referenced this pull request Jun 26, 2023
No need to do double `sparse_mask`, let's squash everything into one call!
This PR exercises #103750, so here is an autogened code for the backward pass.

```
at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) {
  auto& self_ = unpack(self, "self", 0);
  auto& mat1_ = unpack(mat1, "mat1", 1);
  auto& mat2_ = unpack(mat2, "mat2", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 );

  std::shared_ptr<SparseSampledAddmmBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 ));
    grad_fn->alpha = alpha;
    grad_fn->beta = beta;
    if (grad_fn->should_compute_output(2)) {
      grad_fn->mat1_ = SavedVariable(mat1, false);
    }
    if (grad_fn->should_compute_output(1)) {
      grad_fn->mat2_ = SavedVariable(mat2, false);
    }
    grad_fn->self_ = SavedVariable(self, false);
  }

```

As you can see, we do not save tensors unless needed.




cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7

[ghstack-poisoned]
…inst saving"

Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence.

For example, the following code is being generated for `_trilinear` with this modification:
```
at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) {
  auto& i1_ = unpack(i1, "i1", 0);
  auto& i2_ = unpack(i2, "i2", 1);
  auto& i3_ = unpack(i3, "i3", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 );

  [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3));
  std::shared_ptr<TrilinearBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 ));
    grad_fn->expand1 = expand1.vec();
    grad_fn->expand2 = expand2.vec();
    grad_fn->expand3 = expand3.vec();
    if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) {
      grad_fn->i1_ = SavedVariable(i1, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) {
      grad_fn->i2_ = SavedVariable(i2, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) {
      grad_fn->i3_ = SavedVariable(i3, false);
    }
    grad_fn->sumdim = sumdim.vec();
  }

```

with the following backward modifications:
```
 - name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor
  - i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, grad_input_mask)
  + i1, i2, i3: "_trilinear_backward(grad,
  +             wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]),
  +             wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]),
  +             wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]),
  +             expand1, expand2, expand3, sumdim, grad_input_mask)"
```



cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7

[ghstack-poisoned]
nikitaved added a commit that referenced this pull request Jun 26, 2023
…ements"

No need to do double `sparse_mask`, let's squash everything into one call!
This PR exercises #103750, so here is an autogened code for the backward pass.

```
at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) {
  auto& self_ = unpack(self, "self", 0);
  auto& mat1_ = unpack(mat1, "mat1", 1);
  auto& mat2_ = unpack(mat2, "mat2", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 );

  std::shared_ptr<SparseSampledAddmmBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 ));
    grad_fn->alpha = alpha;
    grad_fn->beta = beta;
    if (grad_fn->should_compute_output(2)) {
      grad_fn->mat1_ = SavedVariable(mat1, false);
    }
    if (grad_fn->should_compute_output(1)) {
      grad_fn->mat2_ = SavedVariable(mat2, false);
    }
    grad_fn->self_ = SavedVariable(self, false);
  }

```

As you can see, we do not save tensors unless needed.




cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7

[ghstack-poisoned]
nikitaved added a commit that referenced this pull request Jun 26, 2023
No need to do double `sparse_mask`, let's squash everything into one call!
This PR exercises #103750, so here is an autogened code for the backward pass.

```
at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) {
  auto& self_ = unpack(self, "self", 0);
  auto& mat1_ = unpack(mat1, "mat1", 1);
  auto& mat2_ = unpack(mat2, "mat2", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 );

  std::shared_ptr<SparseSampledAddmmBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 ));
    grad_fn->alpha = alpha;
    grad_fn->beta = beta;
    if (grad_fn->should_compute_output(2)) {
      grad_fn->mat1_ = SavedVariable(mat1, false);
    }
    if (grad_fn->should_compute_output(1)) {
      grad_fn->mat2_ = SavedVariable(mat2, false);
    }
    grad_fn->self_ = SavedVariable(self, false);
  }

```

As you can see, we do not save tensors unless needed.




cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7

[ghstack-poisoned]
@nikitaved
Copy link
Collaborator Author

@soulitzer , could you please have a look at the comment?

Copy link
Contributor

@soulitzer soulitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment looks great, small nit

@@ -110,6 +110,25 @@
# destroy saved buffers if we know variables are not going to be retained,
# e.g., it is used by _cudnn_rnn
#
# In a gradient expression, the following functions are in scope:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this additional header here right? it kind of sounds like this is the only function in scope in the gradient expression.

Copy link
Collaborator Author

@nikitaved nikitaved Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, but the section above is about variables. I could move it there while replacing variables with variables/functions. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh mb didn't realize the difference, combining sounds good to me!

…inst saving"

Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence.

For example, the following code is being generated for `_trilinear` with this modification:
```
at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) {
  auto& i1_ = unpack(i1, "i1", 0);
  auto& i2_ = unpack(i2, "i2", 1);
  auto& i3_ = unpack(i3, "i3", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 );

  [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3));
  std::shared_ptr<TrilinearBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 ));
    grad_fn->expand1 = expand1.vec();
    grad_fn->expand2 = expand2.vec();
    grad_fn->expand3 = expand3.vec();
    if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) {
      grad_fn->i1_ = SavedVariable(i1, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) {
      grad_fn->i2_ = SavedVariable(i2, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) {
      grad_fn->i3_ = SavedVariable(i3, false);
    }
    grad_fn->sumdim = sumdim.vec();
  }

```

with the following backward modifications:
```
 - name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor
  - i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, grad_input_mask)
  + i1, i2, i3: "_trilinear_backward(grad,
  +             wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]),
  +             wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]),
  +             wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]),
  +             expand1, expand2, expand3, sumdim, grad_input_mask)"
```



cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7

[ghstack-poisoned]
nikitaved added a commit that referenced this pull request Jun 27, 2023
…ements"

No need to do double `sparse_mask`, let's squash everything into one call!
This PR exercises #103750, so here is an autogened code for the backward pass.

```
at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) {
  auto& self_ = unpack(self, "self", 0);
  auto& mat1_ = unpack(mat1, "mat1", 1);
  auto& mat2_ = unpack(mat2, "mat2", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 );

  std::shared_ptr<SparseSampledAddmmBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 ));
    grad_fn->alpha = alpha;
    grad_fn->beta = beta;
    if (grad_fn->should_compute_output(2)) {
      grad_fn->mat1_ = SavedVariable(mat1, false);
    }
    if (grad_fn->should_compute_output(1)) {
      grad_fn->mat2_ = SavedVariable(mat2, false);
    }
    grad_fn->self_ = SavedVariable(self, false);
  }

```

As you can see, we do not save tensors unless needed.




cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7

[ghstack-poisoned]
nikitaved added a commit that referenced this pull request Jun 27, 2023
No need to do double `sparse_mask`, let's squash everything into one call!
This PR exercises #103750, so here is an autogened code for the backward pass.

```
at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) {
  auto& self_ = unpack(self, "self", 0);
  auto& mat1_ = unpack(mat1, "mat1", 1);
  auto& mat2_ = unpack(mat2, "mat2", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 );

  std::shared_ptr<SparseSampledAddmmBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 ));
    grad_fn->alpha = alpha;
    grad_fn->beta = beta;
    if (grad_fn->should_compute_output(2)) {
      grad_fn->mat1_ = SavedVariable(mat1, false);
    }
    if (grad_fn->should_compute_output(1)) {
      grad_fn->mat2_ = SavedVariable(mat2, false);
    }
    grad_fn->self_ = SavedVariable(self, false);
  }

```

As you can see, we do not save tensors unless needed.




cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7

[ghstack-poisoned]
@nikitaved
Copy link
Collaborator Author

@soulitzer , the nit is addressed. If there is anything else, please, let me know. Otherwise it is ready to go :)

…inst saving"

Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence.

For example, the following code is being generated for `_trilinear` with this modification:
```
at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) {
  auto& i1_ = unpack(i1, "i1", 0);
  auto& i2_ = unpack(i2, "i2", 1);
  auto& i3_ = unpack(i3, "i3", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 );

  [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3));
  std::shared_ptr<TrilinearBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 ));
    grad_fn->expand1 = expand1.vec();
    grad_fn->expand2 = expand2.vec();
    grad_fn->expand3 = expand3.vec();
    if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) {
      grad_fn->i1_ = SavedVariable(i1, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) {
      grad_fn->i2_ = SavedVariable(i2, false);
    }
    if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) {
      grad_fn->i3_ = SavedVariable(i3, false);
    }
    grad_fn->sumdim = sumdim.vec();
  }

```

with the following backward modifications:
```
 - name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor
  - i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, grad_input_mask)
  + i1, i2, i3: "_trilinear_backward(grad,
  +             wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]),
  +             wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]),
  +             wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]),
  +             expand1, expand2, expand3, sumdim, grad_input_mask)"
```



cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7

[ghstack-poisoned]
nikitaved added a commit that referenced this pull request Jun 27, 2023
…ements"

No need to do double `sparse_mask`, let's squash everything into one call!
This PR exercises #103750, so here is an autogened code for the backward pass.

```
at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) {
  auto& self_ = unpack(self, "self", 0);
  auto& mat1_ = unpack(mat1, "mat1", 1);
  auto& mat2_ = unpack(mat2, "mat2", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 );

  std::shared_ptr<SparseSampledAddmmBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 ));
    grad_fn->alpha = alpha;
    grad_fn->beta = beta;
    if (grad_fn->should_compute_output(2)) {
      grad_fn->mat1_ = SavedVariable(mat1, false);
    }
    if (grad_fn->should_compute_output(1)) {
      grad_fn->mat2_ = SavedVariable(mat2, false);
    }
    grad_fn->self_ = SavedVariable(self, false);
  }

```

As you can see, we do not save tensors unless needed.




cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7

[ghstack-poisoned]
nikitaved added a commit that referenced this pull request Jun 27, 2023
No need to do double `sparse_mask`, let's squash everything into one call!
This PR exercises #103750, so here is an autogened code for the backward pass.

```
at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) {
  auto& self_ = unpack(self, "self", 0);
  auto& mat1_ = unpack(mat1, "mat1", 1);
  auto& mat2_ = unpack(mat2, "mat2", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 );

  std::shared_ptr<SparseSampledAddmmBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 ));
    grad_fn->alpha = alpha;
    grad_fn->beta = beta;
    if (grad_fn->should_compute_output(2)) {
      grad_fn->mat1_ = SavedVariable(mat1, false);
    }
    if (grad_fn->should_compute_output(1)) {
      grad_fn->mat2_ = SavedVariable(mat2, false);
    }
    grad_fn->self_ = SavedVariable(self, false);
  }

```

As you can see, we do not save tensors unless needed.




cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7

[ghstack-poisoned]
@nikitaved
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@nikitaved nikitaved added module: codegen Issues related to the codegen for Aten and Autograd topic: not user facing topic category labels Jun 27, 2023
@nikitaved
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jun 28, 2023
No need to do double `sparse_mask`, let's squash everything into one call!
This PR exercises #103750, so here is an autogened code for the backward pass.

```
at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) {
  auto& self_ = unpack(self, "self", 0);
  auto& mat1_ = unpack(mat1, "mat1", 1);
  auto& mat2_ = unpack(mat2, "mat2", 2);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 );

  std::shared_ptr<SparseSampledAddmmBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 ));
    grad_fn->alpha = alpha;
    grad_fn->beta = beta;
    if (grad_fn->should_compute_output(2)) {
      grad_fn->mat1_ = SavedVariable(mat1, false);
    }
    if (grad_fn->should_compute_output(1)) {
      grad_fn->mat2_ = SavedVariable(mat2, false);
    }
    grad_fn->self_ = SavedVariable(self, false);
  }

```

As you can see, we do not save tensors unless needed.

Pull Request resolved: #103544
Approved by: https://github.com/soulitzer
@facebook-github-bot facebook-github-bot deleted the gh/nikitaved/56/head branch July 1, 2023 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: autograd Related to torch.autograd, and the autograd engine in general module: codegen Issues related to the codegen for Aten and Autograd open source topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants