-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Multioutput backward formula: allow conditional guards against saving #103750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
variables. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103750
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit b76601a: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@albanD , @soulitzer , could you please have a look so that to potentially improve on top of it if the idea is more or less reasonable? |
…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let @soulitzer review this one.
What I had in mind was more of adding a replacement entry here:
pytorch/tools/autograd/load_derivatives.py
Line 758 in ae78e80
REPLACEMENTS: List[Tuple[str, Dict[str, Any]]] = [ |
@albanD , nope, that did not work... The substituted formula is then being parsed by |
inline c10::optional<Tensor> wrap_opt_if(const Tensor& t, const bool cond) { | ||
using OptTensor = c10::optional<Tensor>; | ||
return cond ? OptTensor(t) : static_cast<OptTensor>(c10::nullopt); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the context: this one is used in the next PR up in the stack for sparse_sampled_addmm_backward
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If codegen did its job correctly, we would always get an undefined tensor t here right? Maybe we can assert for that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not really, unfortunately. This code is being run at backward compute. But we can assert inside backward implementations for sure to test both conditions and savings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh whoops, good point.
The tests are missing though, I will figure them out later. |
Ho sad. The new formula should just contain |
Tried, did not work as well :( Hence had to dig deeper. But let me try again, I might have done things wrong... |
…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]
@albanD , right, but expressions needed use several variables, so I need to match and back-substitute into the expression the saving condition. Back-substitution seems possible only in |
…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]
…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]
Looks pretty good! For testing, maybe just apply this to the multi-output formulas we already have? |
Sounds good! Thanks for looking into it in details. |
…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]
…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]
…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. For example, the following code is being generated for `_trilinear` with this modification: ``` at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) { auto& i1_ = unpack(i1, "i1", 0); auto& i2_ = unpack(i2, "i2", 1); auto& i3_ = unpack(i3, "i3", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 ); [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3)); std::shared_ptr<TrilinearBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 )); grad_fn->expand1 = expand1.vec(); grad_fn->expand2 = expand2.vec(); grad_fn->expand3 = expand3.vec(); if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) { grad_fn->i1_ = SavedVariable(i1, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) { grad_fn->i2_ = SavedVariable(i2, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) { grad_fn->i3_ = SavedVariable(i3, false); } grad_fn->sumdim = sumdim.vec(); } ``` cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]
…ements" No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. For example, the following code is being generated for `_trilinear` with this modification: ``` at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) { auto& i1_ = unpack(i1, "i1", 0); auto& i2_ = unpack(i2, "i2", 1); auto& i3_ = unpack(i3, "i3", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 ); [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3)); std::shared_ptr<TrilinearBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 )); grad_fn->expand1 = expand1.vec(); grad_fn->expand2 = expand2.vec(); grad_fn->expand3 = expand3.vec(); if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) { grad_fn->i1_ = SavedVariable(i1, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) { grad_fn->i2_ = SavedVariable(i2, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) { grad_fn->i3_ = SavedVariable(i3, false); } grad_fn->sumdim = sumdim.vec(); } ``` with the following backward modifications: ``` - name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor - i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, grad_input_mask) + i1, i2, i3: "_trilinear_backward(grad, + wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]), + wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]), + wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]), + expand1, expand2, expand3, sumdim, grad_input_mask)" ``` cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]
…ements" No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
@soulitzer , could you please have a look at the comment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment looks great, small nit
tools/autograd/derivatives.yaml
Outdated
@@ -110,6 +110,25 @@ | |||
# destroy saved buffers if we know variables are not going to be retained, | |||
# e.g., it is used by _cudnn_rnn | |||
# | |||
# In a gradient expression, the following functions are in scope: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this additional header here right? it kind of sounds like this is the only function in scope in the gradient expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, but the section above is about variables. I could move it there while replacing variables
with variables/functions
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh mb didn't realize the difference, combining sounds good to me!
…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. For example, the following code is being generated for `_trilinear` with this modification: ``` at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) { auto& i1_ = unpack(i1, "i1", 0); auto& i2_ = unpack(i2, "i2", 1); auto& i3_ = unpack(i3, "i3", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 ); [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3)); std::shared_ptr<TrilinearBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 )); grad_fn->expand1 = expand1.vec(); grad_fn->expand2 = expand2.vec(); grad_fn->expand3 = expand3.vec(); if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) { grad_fn->i1_ = SavedVariable(i1, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) { grad_fn->i2_ = SavedVariable(i2, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) { grad_fn->i3_ = SavedVariable(i3, false); } grad_fn->sumdim = sumdim.vec(); } ``` with the following backward modifications: ``` - name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor - i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, grad_input_mask) + i1, i2, i3: "_trilinear_backward(grad, + wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]), + wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]), + wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]), + expand1, expand2, expand3, sumdim, grad_input_mask)" ``` cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]
…ements" No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
@soulitzer , the nit is addressed. If there is anything else, please, let me know. Otherwise it is ready to go :) |
…inst saving" Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph. This PR introduces a macro `wrap_opt_if` which could be used to hint autogen about variable interdependence. For example, the following code is being generated for `_trilinear` with this modification: ``` at::Tensor _trilinear(c10::DispatchKeySet ks, const at::Tensor & i1, const at::Tensor & i2, const at::Tensor & i3, at::IntArrayRef expand1, at::IntArrayRef expand2, at::IntArrayRef expand3, at::IntArrayRef sumdim, int64_t unroll_dim) { auto& i1_ = unpack(i1, "i1", 0); auto& i2_ = unpack(i2, "i2", 1); auto& i3_ = unpack(i3, "i3", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( i1, i2, i3 ); [[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(i1) || isFwGradDefined(i2) || isFwGradDefined(i3)); std::shared_ptr<TrilinearBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<TrilinearBackward0>(new TrilinearBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( i1, i2, i3 )); grad_fn->expand1 = expand1.vec(); grad_fn->expand2 = expand2.vec(); grad_fn->expand3 = expand3.vec(); if (grad_fn->should_compute_output(1) || grad_fn->should_compute_output(2)) { grad_fn->i1_ = SavedVariable(i1, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(2)) { grad_fn->i2_ = SavedVariable(i2, false); } if (grad_fn->should_compute_output(0) || grad_fn->should_compute_output(1)) { grad_fn->i3_ = SavedVariable(i3, false); } grad_fn->sumdim = sumdim.vec(); } ``` with the following backward modifications: ``` - name: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor - i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, grad_input_mask) + i1, i2, i3: "_trilinear_backward(grad, + wrap_opt_if(i1, grad_input_mask[1] || grad_input_mask[2]), + wrap_opt_if(i2, grad_input_mask[0] || grad_input_mask[2]), + wrap_opt_if(i3, grad_input_mask[0] || grad_input_mask[1]), + expand1, expand2, expand3, sumdim, grad_input_mask)" ``` cc ezyang albanD zou3519 gqchen pearu soulitzer Lezcano Varal7 [ghstack-poisoned]
…ements" No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
No need to do double `sparse_mask`, let's squash everything into one call! This PR exercises #103750, so here is an autogened code for the backward pass. ``` at::Tensor sparse_sampled_addmm(c10::DispatchKeySet ks, const at::Tensor & self, const at::Tensor & mat1, const at::Tensor & mat2, const at::Scalar & beta, const at::Scalar & alpha) { auto& self_ = unpack(self, "self", 0); auto& mat1_ = unpack(mat1, "mat1", 1); auto& mat2_ = unpack(mat2, "mat2", 2); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, mat1, mat2 ); std::shared_ptr<SparseSampledAddmmBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<SparseSampledAddmmBackward0>(new SparseSampledAddmmBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self, mat1, mat2 )); grad_fn->alpha = alpha; grad_fn->beta = beta; if (grad_fn->should_compute_output(2)) { grad_fn->mat1_ = SavedVariable(mat1, false); } if (grad_fn->should_compute_output(1)) { grad_fn->mat2_ = SavedVariable(mat2, false); } grad_fn->self_ = SavedVariable(self, false); } ``` As you can see, we do not save tensors unless needed. Pull Request resolved: #103544 Approved by: https://github.com/soulitzer
Multi-output backward formulas break the ability of autogen to decide which variables have to be stored in a graph.
This PR introduces a macro
wrap_opt_if
which could be used to hint autogen about variable interdependence.For example, the following code is being generated for
_trilinear
with this modification:with the following backward modifications:
Stack from ghstack (oldest at bottom):
cc @ezyang @albanD @zou3519 @gqchen @pearu @soulitzer @lezcano @Varal7 @bhosmer @bdhirsh