-
Notifications
You must be signed in to change notification settings - Fork 225
Fix incorrect CPU topology initialization #3304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| // Call to `getCpuId` changes global settings, in particular, | ||
| // changes default number of threads in the threading layer | ||
| const int cpuid = env->getCpuId(); | ||
| sys_info_["top_enabled_cpu_extension"] = std::make_any<cpu_extension>(from_daal_cpu_type(cpuid)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be converted into a struct instead? Looks like the fields are always the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it could. It also might help to fix the other part of the issue.
|
@Vika-F I can confirm that something in this PR is indeed fixing issues with undefined memory access, at the very least on ice lake, as confirmed by memsan no longer reporting incorrect accesses, and by asan no longer reporting leaked memory from a corrupted glktsn object. It still segfaults if sklearnex is imported but not used, and there might still be other issues causing segfaults upon import in other scenarios, but this at least solves the main problem. I still see issues like this reported from tysan when running the C++ examples though: .. but guess there's some chance that they might be due to some unrelated memory corruption that coincidentally happens to affect that object at that point. |
… std::array usage to check for out of bound access
|
/intelci: run |
|
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves error handling and fixes bugs in CPU topology initialization in oneDAL/DAAL. The changes focus on making the global CPU topology detection more robust and preventing undefined behavior when the Environment object is uninitialized.
- Fixed null pointer safety by adding checks for uninitialized Environment instances
- Improved error handling in CPU topology detection functions
- Fixed memory safety issues in CPU topology analysis functions
Reviewed Changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/oneapi/dal/detail/parameters/system_parameters_impl.hpp | Changed class to struct and updated map key type from string to const char* |
| cpp/oneapi/dal/detail/parameters/system_parameters_impl.cpp | Added null pointer checks for Environment and improved parameter initialization |
| cpp/oneapi/dal/detail/parameters/system_parameters.cpp | Simplified smart pointer construction |
| cpp/oneapi/dal/detail/cpu.hpp | Updated CPU feature map key type and moved architecture defines |
| cpp/oneapi/dal/common.hpp | Added architecture detection defines moved from cpu.hpp |
| cpp/daal/src/services/service_topo.h | Removed large amounts of platform-specific code and type definitions |
| cpp/daal/src/services/library_version_info.cpp | Added null pointer check for Environment instance |
| cpp/daal/src/services/env_detect.cpp | Enhanced Environment initialization with error handling and null checks |
| cpp/daal/src/externals/service_service_mkl.h | Added error checking for CPU topology status |
| cpp/daal/src/data_management/data_conversion.cpp | Added null pointer check for Environment instance |
| cpp/daal/src/algorithms/k_nearest_neighbors/kdtree_knn_classification_train_dense_default_impl.i | Added null pointer check for Environment instance |
| cpp/daal/include/algorithms/algorithm_base_common.h | Added null pointer check for Environment instance |
|
/intelci: run |
…zation in case of hyper-threading enabled
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
1 similar comment
|
/intelci: run |
|
/intelci: run |
Description
The error handling in the global object that defines CPU topology in oneDAL/DAAL and in global
daal::services::Environmentobject was improved.Several related bugs were fixed.
setChkProcessAffinityConsistency()was changed to allow affinity masks that contain zeros. Previously having the process affinity mask with zeros lead to undefined behavior because the global object that defines CPU topology ended up in non-initialized state in that case.analyzeEachCHierarchy()function. PreviouslypDetectedEachCIDsandpDetectThreadIDsperEachCwere invalidly writing the memory after the end of arrays in the branchif (!CacheMarked).daal::services::Environmentclass was updated to always callgetCpuId()method which initializes theEnvironmentinstance. Without this call an uninitialized version ofEnvironmentmight be used.daal::services::Environment::getInstance()function that returns the pointer to a global DAAL'sEnvironmentobject was changed. Now the function returnsnullptrin case the globalEnvironmentobject is uninitialized. The respective null pointer checks were added into the calling code.daal::services::Environment::initNumberOfThreads()was changed to handle the case of hyper-threading correctly, setting the maximal number of threads available to oneDAL equal to the number of physical CPU cores.PR completeness and readability
Testing
Performance
not applicable