Skip to content

Conversation

@rcap107
Copy link
Member

@rcap107 rcap107 commented Jul 23, 2025

Fixes #1540

@rcap107 rcap107 changed the title Fix 1540 Fix - get_feature_names_out fails in two cases Jul 24, 2025
@rcap107 rcap107 marked this pull request as ready for review July 28, 2025 12:46
@rcap107
Copy link
Member Author

rcap107 commented Jul 28, 2025

The tests are failing with the min-optional-deps configuration because the column transformer in min-deps configuration does not support polars dataframes: in all_columns, it is expecting a pandas index, while polars gives a list.

How should I deal with this ? I can remove the test with the column transformer (I added it mostly to check for the behavior in #1540), but maybe there is a way to skip the test for the given configuration. What's the best choice here?

@rcap107 rcap107 added this to the 0.6.1 milestone Jul 28, 2025
Copy link
Member

@Vincent-Maladiere Vincent-Maladiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rcap107! After testing on different CI versions with pixi, it looks that the error you have is specific to Polars on column transformers, on a prior version of sklearn. We can go two ways:

  • We force conversion from polars to Pandas for old sklearn versions
  • or we ignore it and suggest people to upgrade to latest sklearn and polars if they aim at using column transformer

WDYT?

rcap107 and others added 2 commits July 29, 2025 14:28
@rcap107
Copy link
Member Author

rcap107 commented Jul 29, 2025

Hey @rcap107! After testing on different CI versions with pixi, it looks that the error you have is specific to Polars on column transformers, on a prior version of sklearn. We can go two ways:

  • We force conversion from polars to Pandas for old sklearn versions
  • or we ignore it and suggest people to upgrade to latest sklearn and polars if they aim at using column transformer

WDYT?

Personally I'd rather go with the second, but since it's on sklearn side I wonder if someone that is stuck on that version just can't upgrade it 🤔

So I think the first choice is overall better, but we need to document it properly

@rcap107
Copy link
Member Author

rcap107 commented Jul 30, 2025

Hey @rcap107! After testing on different CI versions with pixi, it looks that the error you have is specific to Polars on column transformers, on a prior version of sklearn. We can go two ways:

  • We force conversion from polars to Pandas for old sklearn versions
  • or we ignore it and suggest people to upgrade to latest sklearn and polars if they aim at using column transformer

WDYT?

Personally I'd rather go with the second, but since it's on sklearn side I wonder if someone that is stuck on that version just can't upgrade it 🤔

So I think the first choice is overall better, but we need to document it properly

I'm trying to implement this, but it's not clear to me where I should force the conversion to Polars, given that the call that fails is inside the code of the column transformer, and not on our side

Unless we convert by default (again, where?) polars dataframes to pandas if the sklearn version is too low.

@Vincent-Maladiere
Copy link
Member

You're right, the issue is on sklearn side. One way to do it would be to patch the fit_transform method of the ColumnTransformer for sklearn < 1.?.0. in skrub/sklearn_compat.py. Or we could ignore it and skip fail the test in this situation.

@rcap107
Copy link
Member Author

rcap107 commented Aug 1, 2025

I had to patch the fit_transform and transform methods of the ColumnTransformer so that they convert X to pandas if it is a polars dataframe or series

Copy link
Member

@Vincent-Maladiere Vincent-Maladiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rcap107, some feedback

ColumnTransformer.original_fit_transform = ColumnTransformer.fit_transform
ColumnTransformer.original_transform = ColumnTransformer.transform

def _patched_fit_transform(self, X, y=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simplify the logic slightly by using a decorator, and use it like:

ColumnTransformer.fit_transform = wrap_to_pandas(ColumnTransformer.fit_transform)
ColumnTransformer.transform = wrap_to_pandas(ColumnTransformer.transform)

original_type = None
if isinstance(X, (pl.DataFrame, pl.Series)):
original_type = type(X)
X = X.to_pandas()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if y is not None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y is forwarded unchanged to the ColumnTransformer in the fit_transform, transform does not take y at all

Copy link
Member

@Vincent-Maladiere Vincent-Maladiere Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be an issue if y was a pandas DataFrame/series?

@rcap107
Copy link
Member Author

rcap107 commented Aug 1, 2025

as it turns out, the bug with DropCols is also affecting any situation where ApplyToCols is used, both explicitly, or from within the dataops evaluation, so it's a more serious issue than I had first understood

@Vincent-Maladiere
Copy link
Member

the bug with DropCols is also affecting any situation where ApplyToCols is used

I'm not sure to follow, could you show an example?

@rcap107
Copy link
Member Author

rcap107 commented Aug 1, 2025

the bug with DropCols is also affecting any situation where ApplyToCols is used

I'm not sure to follow, could you show an example?

this code fails on main, but it works with the fix in this PR

#%%
import skrub
from skrub import DropCols
import polars as pl
import skrub.selectors as s
from skrub._apply_to_cols import ApplyToCols

df = pl.DataFrame({
    "a": [1, 2, 3],
    "b": [4, 5, 6],})

# %%
app = ApplyToCols(DropCols(s.glob("")),)
app.fit_transform(df)
# %%
v = skrub.var("df", df)
v.skb.apply(app)

@rcap107
Copy link
Member Author

rcap107 commented Aug 4, 2025

The reason this issue is stuck is because the ColumnTransformer does not work with polars on scikit-learn < 1.5, so I tried to patch it to fix the problem. However, this was a bigger issue than anticipated.

I realized that the test that was failing was something that I added to reproduce the error in #1540, so the problem has been there for much longer.

I think we should ignore that particular problem in this PR, and tackle separately. Here, we can address the specific problems with #1540.

@rcap107
Copy link
Member Author

rcap107 commented Aug 4, 2025

In short, I'll remove the extra test that is failing and all the sklearn compat code, and continue with the rest so we can merge the fix

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a similar fix than in #1256 so LGTM.

@rcap107 rcap107 merged commit 73053a0 into skrub-data:main Aug 12, 2025
26 checks passed
@rcap107 rcap107 deleted the fix-1540 branch August 12, 2025 11:42
glemaitre added a commit to glemaitre/skrub that referenced this pull request Aug 14, 2025
Co-authored-by: Jérôme Dockès <[email protected]>
Co-authored-by: Vincent M <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
rcap107 added a commit that referenced this pull request Aug 14, 2025
Co-authored-by: Jérôme Dockès <[email protected]>
Co-authored-by: Vincent M <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Discussion: is get_features_name_out something missing or on purpose?

4 participants