Tensor subclass slice during `inference_mode` fails

Error:
```
RuntimeError: Cannot set version_counter for inference tensor
```

Minimal repro:
```
import torch
from torchao.quantization.quantize_.workflows import Float8Tensor

x = Float8Tensor.from_hp(torch.randn(3, 4))

# This is fine
x[0:1]

# This fails
with torch.inference_mode():
    x[0:1]
```

`Float8Tensor` slice dispatches to [this handler](https://github.com/pytorch/ao/blob/2fe0ca0899c730c528efdbec8886feaa38879f39/torchao/quantization/quantize_/workflows/float8/float8_tensor.py#L422), which runs to completion without problems. The error seems to be triggered after we slice the tensor. This also happens to a few other tensor subclasses I've tried, and doesn't happen with non-subclassed `torch.Tensor`.

Full CPP stack trace: https://gist.github.com/andrewor14/062b753e72d419d2c1e2a9d4e142b1fa

My question is why are we trying to increment the version counter for an op that's not in-place? Is this expected to work?

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel @ezyang @albanD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tensor subclass slice during `inference_mode` fails #164872

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tensor subclass slice during inference_mode fails #164872

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Tensor subclass slice during `inference_mode` fails #164872