Skip to content

Conversation

@csadorf
Copy link
Contributor

@csadorf csadorf commented Mar 17, 2025

This PR adds support for handling sparse input arrays in the KMeans algorithm by dispatching to CPU implementation when sparse arrays are detected during fitting. It also updates the sparse array detection utilities to be more robust and consistent across the codebase.

Fixes scikit-learn test test_kmeans_results[float64-lloyd-sparse_array] in combination with #6442 .

Changes

  • Added _should_dispatch_cpu method to KMeans to handle sparse input arrays
  • Updated is_sparse utility function to use issparse instead of isspmatrix for better compatibility
  • Updated sparse array detection in input_utils.py to use the new issparse method

Testing

  • Verified that KMeans correctly dispatches to CPU implementation when sparse arrays are detected

csadorf added 2 commits March 17, 2025 12:19
- Introduced a new method in KMeans to dispatch CPU implementation when sparse arrays are detected during fitting.
- Updated the is_sparse function to use cupyx' and scipy's issparse method for better compatibility.
- Introduced a new test to verify that KMeans correctly dispatches to CPU when fitting with sparse input.
- Ensured that the model's attributes and predictions are validated as numpy arrays when using sparse data.
@csadorf csadorf self-assigned this Mar 17, 2025
@csadorf csadorf requested a review from a team as a code owner March 17, 2025 17:22
@csadorf csadorf requested review from betatim and teju85 March 17, 2025 17:22
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Mar 17, 2025
@csadorf csadorf added bug Something isn't working non-breaking Non-breaking change labels Mar 17, 2025
@csadorf
Copy link
Contributor Author

csadorf commented Mar 17, 2025

This is not taking advantage of some of the existing infrastructure for this:

# check for sparse inputs and whether estimator supports them
sparse_support = "sparse" in self._get_tags()["X_types_gpu"]
if args and is_sparse(args[0]):
if sparse_support:
return DeviceType.device
elif GlobalSettings().accelerator_active and not sparse_support:
logger.info(
f"cuML: Estimator {self} does not support sparse inputs in GPU."
)
return DeviceType.host
else:
raise NotImplementedError(
"Estimator does not support sparse inputs currently"
)

Copy link
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor recommendations

@csadorf
Copy link
Contributor Author

csadorf commented Mar 18, 2025

/merge

Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rapids-bot rapids-bot bot merged commit de28250 into rapidsai:branch-25.04 Mar 25, 2025
75 of 76 checks passed
@csadorf csadorf deleted the fix/fallback-kmeans-to-cpu-for-sparse-inputs branch March 26, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Cython / Python Cython or Python issue non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants