-
Notifications
You must be signed in to change notification settings - Fork 185
Fix - get_feature_names_out fails in two cases
#1543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The tests are failing with the How should I deal with this ? I can remove the test with the column transformer (I added it mostly to check for the behavior in #1540), but maybe there is a way to skip the test for the given configuration. What's the best choice here? |
Vincent-Maladiere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rcap107! After testing on different CI versions with pixi, it looks that the error you have is specific to Polars on column transformers, on a prior version of sklearn. We can go two ways:
- We force conversion from polars to Pandas for old sklearn versions
- or we ignore it and suggest people to upgrade to latest sklearn and polars if they aim at using column transformer
WDYT?
Co-authored-by: Vincent M <[email protected]>
Co-authored-by: Vincent M <[email protected]>
Personally I'd rather go with the second, but since it's on sklearn side I wonder if someone that is stuck on that version just can't upgrade it 🤔 So I think the first choice is overall better, but we need to document it properly |
I'm trying to implement this, but it's not clear to me where I should force the conversion to Polars, given that the call that fails is inside the code of the column transformer, and not on our side Unless we convert by default (again, where?) polars dataframes to pandas if the sklearn version is too low. |
|
You're right, the issue is on sklearn side. One way to do it would be to patch the |
|
I had to patch the fit_transform and transform methods of the ColumnTransformer so that they convert |
Vincent-Maladiere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rcap107, some feedback
skrub/_sklearn_compat.py
Outdated
| ColumnTransformer.original_fit_transform = ColumnTransformer.fit_transform | ||
| ColumnTransformer.original_transform = ColumnTransformer.transform | ||
|
|
||
| def _patched_fit_transform(self, X, y=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could simplify the logic slightly by using a decorator, and use it like:
ColumnTransformer.fit_transform = wrap_to_pandas(ColumnTransformer.fit_transform)
ColumnTransformer.transform = wrap_to_pandas(ColumnTransformer.transform)
skrub/_sklearn_compat.py
Outdated
| original_type = None | ||
| if isinstance(X, (pl.DataFrame, pl.Series)): | ||
| original_type = type(X) | ||
| X = X.to_pandas() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if y is not None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y is forwarded unchanged to the ColumnTransformer in the fit_transform, transform does not take y at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be an issue if y was a pandas DataFrame/series?
|
as it turns out, the bug with DropCols is also affecting any situation where ApplyToCols is used, both explicitly, or from within the dataops evaluation, so it's a more serious issue than I had first understood |
I'm not sure to follow, could you show an example? |
this code fails on main, but it works with the fix in this PR #%%
import skrub
from skrub import DropCols
import polars as pl
import skrub.selectors as s
from skrub._apply_to_cols import ApplyToCols
df = pl.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6],})
# %%
app = ApplyToCols(DropCols(s.glob("")),)
app.fit_transform(df)
# %%
v = skrub.var("df", df)
v.skb.apply(app) |
|
The reason this issue is stuck is because the ColumnTransformer does not work with polars on scikit-learn < 1.5, so I tried to patch it to fix the problem. However, this was a bigger issue than anticipated. I realized that the test that was failing was something that I added to reproduce the error in #1540, so the problem has been there for much longer. I think we should ignore that particular problem in this PR, and tackle separately. Here, we can address the specific problems with #1540. |
|
In short, I'll remove the extra test that is failing and all the sklearn compat code, and continue with the rest so we can merge the fix |
glemaitre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a similar fix than in #1256 so LGTM.
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Jérôme Dockès <[email protected]> Co-authored-by: Vincent M <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Jérôme Dockès <[email protected]> Co-authored-by: Vincent M <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]>
Fixes #1540