-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Description
🐛 Describe the bug
This bug belongs to the C++ part.
Currently (at least for pytorch 2.6.0 to 2.8.0), the template version of data_ptr and mutable_data_ptr is declared as follow in ATen/core/TensorBase.h
template <typename T>
T* mutable_data_ptr() const;
template <typename T>
T* data_ptr() const;
Since the implemention of these functions are not included in the header, they are not visible to the C++ compiler if someone just include the header while not compiling the whole PyTorch project from source. (This is very usual when someone wants to build a C++ Op Extension that can accept a torch.Tensor)
As a result, the C++ compiler will just leave a symbol reference to mutable_data_ptr<T>
and hopes that symbol exists in torch_cpu.so
, however, this isn't always true. For example, mutable_data_ptr<char>
doesn't exists in torch_cpu.so
. Since the symbol solving of building a shared library is delayed, the build of your module will succeed, but when you want to import it, you will met an error likes undefined symbol: _ZNK2at10TensorBase16mutable_data_ptrIcEEPT_v
, which can be demangled to char* at::TensorBase::mutable_data_ptr<char>() const
Actually, whenever you provide a template function interface, you should provide the full implemention in the header file, as you cannot Explicit template specialization for any possible type.
I think a simple solution is to replace the above declartion with the below, which simply redirect to the non-template version
template <typename T>
T* mutable_data_ptr() const {
return reinterpret_cast<T*>(mutable_data_ptr());
}
template <typename T>
T* data_ptr() const {
return reinterpret_cast<T*>(data_ptr());
}
Versions
I currently use torch==2.7.0+cpu, but I think this problem exists across a wide range of versions