Skip to content

Conversation

angelayi
Copy link
Contributor

Turns out some graphs will result in getattr nodes...so let's serialize them

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 25, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107924

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit be3d23e with merge base b4c6c4d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@zhxchen17 zhxchen17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think it's reasonable to have get_attr in the graph. In fact I forgot this corner case before.

@angelayi angelayi requested a review from zhxchen17 August 25, 2023 05:32
@@ -250,6 +250,7 @@ class GraphModule:
# TODO(zhxchen17) Merge call_spec into call graph.
call_spec: CallSpec
module_call_graph: List[ModuleCallEntry]
constants: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to bump schema version?

Copy link
Contributor Author

@angelayi angelayi Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking no because we haven't published things yet :P but I would bump it if a schema change was made after today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so you're going to store attributes that are tensors here. Didn't we have a general place to store tensors by reference in the IR as well?

@angelayi angelayi added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 25, 2023
@angelayi angelayi requested a review from gmagogsfm August 25, 2023 15:49
@facebook-github-bot
Copy link
Contributor

@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@avikchaudhuri avikchaudhuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, few questions.

elif isinstance(attr, torch.fx.GraphModule):
with self.save_graph_state():
graph = self.serialize_graph(attr)
return Argument.create(as_graph=GraphArgument(name=arg.target, graph=graph))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if in the future it would make sense to store graphs by name as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either way is fine, but graph arguments are more consistent with fx.GraphModule and other existing format (e.g. onnx).

@@ -250,6 +250,7 @@ class GraphModule:
# TODO(zhxchen17) Merge call_spec into call graph.
call_spec: CallSpec
module_call_graph: List[ModuleCallEntry]
constants: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so you're going to store attributes that are tensors here. Didn't we have a general place to store tensors by reference in the IR as well?

Copy link
Contributor

@SherlockNoMad SherlockNoMad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I would need to block this diff.
Let's consider Tensor serialization holistically with AOTInductor, and Model Processing workstream.

cc @suo, @muchulee8

@gmagogsfm
Copy link
Contributor

Sorry, I would need to block this diff. Let's consider Tensor serialization holistically with AOTInductor, and Model Processing workstream.

cc @suo, @muchulee8

Sorry, I would need to block this diff. Let's consider Tensor serialization holistically with AOTInductor, and Model Processing workstream.

cc @suo, @muchulee8

The serialization format is private implementation detail hidden behind torch.export.save/load API, so we have freedom to evolve it.

Since there is no support for serializing or deserializing constant tensors like these yet, this change doesn't make the situation any worse than what it is today (no support at all).

Can we land this solution (torch.save-based) now so that torch.export.save() doesn't silently lose constant tensors (a severe bug) today to make 2.1 branchcut? Then we can immediately evolve it in a BC-compatible way.

@SherlockNoMad
Copy link
Contributor

Sorry, I would need to block this diff. Let's consider Tensor serialization holistically with AOTInductor, and Model Processing workstream.
cc @suo, @muchulee8

Sorry, I would need to block this diff. Let's consider Tensor serialization holistically with AOTInductor, and Model Processing workstream.
cc @suo, @muchulee8

The serialization format is private implementation detail hidden behind torch.export.save/load API, so we have freedom to evolve it.

Since there is no support for serializing or deserializing constant tensors like these yet, this change doesn't make the situation any worse than what it is today (no support at all).

Can we land this solution (torch.save-based) now so that torch.export.save() doesn't silently lose constant tensors (a severe bug) today to make 2.1 branchcut? Then we can immediately evolve it in a BC-compatible way.

Ok, but we need to make it clear that we DOES NOT have any BC guarantee on this exported version of the weights! My concern is that once this weight format is leaked into production service, we will be tied to it, which becomes very hard to change!

Notice that this is a different problem from the state_dict serialization! Weights in state_dict serialization and deserialization is not specific in the PT2 IR. But this constant tensor serialization/deserialization is part of the IR spec.

@gmagogsfm
Copy link
Contributor

Sorry, I would need to block this diff. Let's consider Tensor serialization holistically with AOTInductor, and Model Processing workstream.
cc @suo, @muchulee8

Sorry, I would need to block this diff. Let's consider Tensor serialization holistically with AOTInductor, and Model Processing workstream.
cc @suo, @muchulee8

The serialization format is private implementation detail hidden behind torch.export.save/load API, so we have freedom to evolve it.
Since there is no support for serializing or deserializing constant tensors like these yet, this change doesn't make the situation any worse than what it is today (no support at all).
Can we land this solution (torch.save-based) now so that torch.export.save() doesn't silently lose constant tensors (a severe bug) today to make 2.1 branchcut? Then we can immediately evolve it in a BC-compatible way.

Ok, but we need to make it clear that we DOES NOT have any BC guarantee on this exported version of the weights! My concern is that once this weight format is leaked into production service, we will be tied to it, which becomes very hard to change!

Notice that this is a different problem from the state_dict serialization! Weights in state_dict serialization and deserialization is not specific in the PT2 IR. But this constant tensor serialization/deserialization is part of the IR spec.

Yeah, we made some scary warnings in the pytorch doc

.. warning::

@SherlockNoMad SherlockNoMad self-requested a review August 25, 2023 19:40
@SherlockNoMad
Copy link
Contributor

Sorry, I would need to block this diff. Let's consider Tensor serialization holistically with AOTInductor, and Model Processing workstream.
cc @suo, @muchulee8

Sorry, I would need to block this diff. Let's consider Tensor serialization holistically with AOTInductor, and Model Processing workstream.
cc @suo, @muchulee8

The serialization format is private implementation detail hidden behind torch.export.save/load API, so we have freedom to evolve it.
Since there is no support for serializing or deserializing constant tensors like these yet, this change doesn't make the situation any worse than what it is today (no support at all).
Can we land this solution (torch.save-based) now so that torch.export.save() doesn't silently lose constant tensors (a severe bug) today to make 2.1 branchcut? Then we can immediately evolve it in a BC-compatible way.

Ok, but we need to make it clear that we DOES NOT have any BC guarantee on this exported version of the weights! My concern is that once this weight format is leaked into production service, we will be tied to it, which becomes very hard to change!
Notice that this is a different problem from the state_dict serialization! Weights in state_dict serialization and deserialization is not specific in the PT2 IR. But this constant tensor serialization/deserialization is part of the IR spec.

Yeah, we made some scary warnings in the pytorch doc

.. warning::

Ok, I removed the block. Let's come up with a proper solution soon.

@gmagogsfm
Copy link
Contributor

Ok, I removed the block. Let's come up with a proper solution soon.

Yep, let's do it next week
@angelayi

@angelayi angelayi requested a review from zhxchen17 August 25, 2023 21:13
@SherlockNoMad
Copy link
Contributor

Another plausible solution is to lift such constant weights as graph inputs, so they can be handle in the same way as weights/buffers.

@gmagogsfm gmagogsfm dismissed SherlockNoMad’s stale review August 25, 2023 22:55

discussed in comment, agreed to proceed

@gmagogsfm gmagogsfm dismissed zhxchen17’s stale review August 25, 2023 22:56

comment resolved, reviewer out of work hour, need to land soon

@angelayi
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: Meta Internal-Only Changes Check

Details for Dev Infra team Raised by workflow job

@angelayi
Copy link
Contributor Author

@pytorchbot merge -f "only failure is Meta Internal-Only Changes Check, which is an infra failure"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

voznesenskym pushed a commit that referenced this pull request Aug 27, 2023
Turns out some graphs will result in getattr nodes...so let's serialize them
Pull Request resolved: #107924
Approved by: https://github.com/zhxchen17, https://github.com/avikchaudhuri
@github-actions github-actions bot deleted the angelayi/getattr branch February 24, 2025 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: export
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants