How should we handle PyTorch build flags in torch/headeronly for custom ops?

### 🐛 Describe the bug

This isn't exactly a bug, per sé, but it is misleading. Thanks to @mikaylagawarecki pointing out the following phenomenon in a parallel file, I'm realizing we have the following behavior in torch/headeronly/util/Half.h today:

Consider the following ifdef
https://github.com/pytorch/pytorch/blob/6861fa43e5fee7fedc0213e352fa983edea8aa78/torch/headeronly/util/Half.h#L44-L47

When libtorch is compiling Half.h, it will properly generate the fast vectorization logic depending on how CPU_CAPABILITY_AVX2 and CPU_CAPABILITY_AVX512 is set. Great. This is expected.

What may be unexpected is that custom ops including the headeronly Half.h will _not_ have CPU_CAPABILITY_AVX2 or CPU_CAPABILITY_AVX512 set and so will not have performant CPU code for `float2half_scalar` and `half2float_scalar` of Half.h.

### Versions

on main

cc @malfet @seemethere @chauhang @penguinwu @zou3519 @bdhirsh @swolchok 

	#if (defined(CPU_CAPABILITY_AVX2) \|\| defined(CPU_CAPABILITY_AVX512)) && \
	!defined(__APPLE__)
	#include <torch/headeronly/cpu/vec/vec_half.h>
	#endif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How should we handle PyTorch build flags in torch/headeronly for custom ops? #164786

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How should we handle PyTorch build flags in torch/headeronly for custom ops? #164786

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions