boxing-unboxing overhead seems significant

https://gist.github.com/zou3519/b987e00a82c7e184b8896a5df7b0bfa9

Benchmarking two cases:
1. torch.ops.mylib.foo operator that has an Autograd key that takes unboxed inputs but a CPU key that boxes (via return to Python)
2. torch.ops.mylib.foo_cpp operator that has an Autograd key and CPU key (in cpp) that take unboxed inputs
```
num_tensors 5
2.7380013465881348  # clone
13.052228927612305  # foo
8.257509231567383  # foo_cpp
```

NB: We have an Autograd key that accepts unboxed inputs to emulate how built-in PyTorch operators work. If I delete the autograd registration for both operators, then it becomes a boxed fallback, which brings the numbers a lot closer together (both at around 8). It looks like one unboxing isn't bad, but a boxing is bad.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

boxing-unboxing overhead seems significant #139521

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

boxing-unboxing overhead seems significant #139521

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions