Skip to content

Conversation

@Vika-F
Copy link
Contributor

@Vika-F Vika-F commented Jul 25, 2025

Description

The error handling in the global object that defines CPU topology in oneDAL/DAAL and in global daal::services::Environment object was improved.
Several related bugs were fixed.

  • cpp/daal/src/services/service_topo.h and cpp/daal/src/services/service_topo.cpp files re-worked to make it less error prone.
  • The logic in setChkProcessAffinityConsistency() was changed to allow affinity masks that contain zeros. Previously having the process affinity mask with zeros lead to undefined behavior because the global object that defines CPU topology ended up in non-initialized state in that case.
  • Out-of-bound memory access issue was fixed in analyzeEachCHierarchy() function. Previously pDetectedEachCIDs and pDetectThreadIDsperEachC were invalidly writing the memory after the end of arrays in the branch if (!CacheMarked).
  • The constructor of daal::services::Environment class was updated to always call getCpuId() method which initializes the Environment instance. Without this call an uninitialized version of Environment might be used.
  • The behavior of daal::services::Environment::getInstance() function that returns the pointer to a global DAAL's Environment object was changed. Now the function returns nullptr in case the global Environment object is uninitialized. The respective null pointer checks were added into the calling code.
  • The behavior of daal::services::Environment::initNumberOfThreads() was changed to handle the case of hyper-threading correctly, setting the maximal number of threads available to oneDAL equal to the number of physical CPU cores.

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

not applicable

@Vika-F Vika-F added the bug label Jul 25, 2025
// Call to `getCpuId` changes global settings, in particular,
// changes default number of threads in the threading layer
const int cpuid = env->getCpuId();
sys_info_["top_enabled_cpu_extension"] = std::make_any<cpu_extension>(from_daal_cpu_type(cpuid));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be converted into a struct instead? Looks like the fields are always the same.

Copy link
Contributor Author

@Vika-F Vika-F Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could. It also might help to fix the other part of the issue.

@david-cortes-intel
Copy link
Contributor

@Vika-F I can confirm that something in this PR is indeed fixing issues with undefined memory access, at the very least on ice lake, as confirmed by memsan no longer reporting incorrect accesses, and by asan no longer reporting leaked memory from a corrupted glktsn object.

It still segfaults if sklearnex is imported but not used, and there might still be other issues causing segfaults upon import in other scenarios, but this at least solves the main problem.

I still see issues like this reported from tysan when running the C++ examples though:

WRITE of size 8 at 0x7fb6a9d64fd0 with type p1 int (in daal::services::internal::Dyn2Arr_str at offset 8) accesses part of an existing object of type daal::services::internal::glktsn that starts at offset -64

.. but guess there's some chance that they might be due to some unrelated memory corruption that coincidentally happens to affect that object at that point.

@Vika-F
Copy link
Contributor Author

Vika-F commented Jul 30, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Jul 31, 2025

/intelci: run

@Vika-F Vika-F requested a review from Copilot July 31, 2025 08:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves error handling and fixes bugs in CPU topology initialization in oneDAL/DAAL. The changes focus on making the global CPU topology detection more robust and preventing undefined behavior when the Environment object is uninitialized.

  • Fixed null pointer safety by adding checks for uninitialized Environment instances
  • Improved error handling in CPU topology detection functions
  • Fixed memory safety issues in CPU topology analysis functions

Reviewed Changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cpp/oneapi/dal/detail/parameters/system_parameters_impl.hpp Changed class to struct and updated map key type from string to const char*
cpp/oneapi/dal/detail/parameters/system_parameters_impl.cpp Added null pointer checks for Environment and improved parameter initialization
cpp/oneapi/dal/detail/parameters/system_parameters.cpp Simplified smart pointer construction
cpp/oneapi/dal/detail/cpu.hpp Updated CPU feature map key type and moved architecture defines
cpp/oneapi/dal/common.hpp Added architecture detection defines moved from cpu.hpp
cpp/daal/src/services/service_topo.h Removed large amounts of platform-specific code and type definitions
cpp/daal/src/services/library_version_info.cpp Added null pointer check for Environment instance
cpp/daal/src/services/env_detect.cpp Enhanced Environment initialization with error handling and null checks
cpp/daal/src/externals/service_service_mkl.h Added error checking for CPU topology status
cpp/daal/src/data_management/data_conversion.cpp Added null pointer check for Environment instance
cpp/daal/src/algorithms/k_nearest_neighbors/kdtree_knn_classification_train_dense_default_impl.i Added null pointer check for Environment instance
cpp/daal/include/algorithms/algorithm_base_common.h Added null pointer check for Environment instance

@Vika-F Vika-F marked this pull request as ready for review July 31, 2025 15:03
@Vika-F
Copy link
Contributor Author

Vika-F commented Dec 15, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Dec 16, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Dec 17, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Dec 18, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Dec 19, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Dec 22, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Dec 23, 2025

/intelci: run

1 similar comment
@Vika-F
Copy link
Contributor Author

Vika-F commented Dec 23, 2025

/intelci: run

@Vika-F
Copy link
Contributor Author

Vika-F commented Dec 23, 2025

/intelci: run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants