Skip to content

How should we handle PyTorch build flags in torch/headeronly for custom ops? #164786

@janeyx99

Description

@janeyx99

🐛 Describe the bug

This isn't exactly a bug, per sé, but it is misleading. Thanks to @mikaylagawarecki pointing out the following phenomenon in a parallel file, I'm realizing we have the following behavior in torch/headeronly/util/Half.h today:

Consider the following ifdef

#if (defined(CPU_CAPABILITY_AVX2) || defined(CPU_CAPABILITY_AVX512)) && \
!defined(__APPLE__)
#include <torch/headeronly/cpu/vec/vec_half.h>
#endif

When libtorch is compiling Half.h, it will properly generate the fast vectorization logic depending on how CPU_CAPABILITY_AVX2 and CPU_CAPABILITY_AVX512 is set. Great. This is expected.

What may be unexpected is that custom ops including the headeronly Half.h will not have CPU_CAPABILITY_AVX2 or CPU_CAPABILITY_AVX512 set and so will not have performant CPU code for float2half_scalar and half2float_scalar of Half.h.

Versions

on main

cc @malfet @seemethere @chauhang @penguinwu @zou3519 @bdhirsh @swolchok

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: buildBuild system issuesmodule: custom-operatorscustom operators, custom ops, custom-operators, custom-opsmodule: pt2-dispatcherPT2 dispatcher-related issues (e.g., aotdispatch, functionalization, faketensor, custom-op,oncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions