Skip to content

boxing-unboxing overhead seems significant #139521

@zou3519

Description

@zou3519

https://gist.github.com/zou3519/b987e00a82c7e184b8896a5df7b0bfa9

Benchmarking two cases:

  1. torch.ops.mylib.foo operator that has an Autograd key that takes unboxed inputs but a CPU key that boxes (via return to Python)
  2. torch.ops.mylib.foo_cpp operator that has an Autograd key and CPU key (in cpp) that take unboxed inputs
num_tensors 5
2.7380013465881348  # clone
13.052228927612305  # foo
8.257509231567383  # foo_cpp

NB: We have an Autograd key that accepts unboxed inputs to emulate how built-in PyTorch operators work. If I delete the autograd registration for both operators, then it becomes a boxed fallback, which brings the numbers a lot closer together (both at around 8). It looks like one unboxing isn't bad, but a boxing is bad.

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: dispatchDispatchStub, Type, void pointer table, c10 dispatchtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions