-
Notifications
You must be signed in to change notification settings - Fork 185
Add add_tostr flag to Cleaner to control ToStr transformation #1789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add add_tostr flag to Cleaner to control ToStr transformation #1789
Conversation
rcap107
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @PilliSiddharth. Overall, it looks good, but there are a few things that should be changed.
We should also rethink the name of the add_tostr parameter, but at the moment I don't have a very good alternative in mind.
CHANGES.rst
Outdated
| ``.skb.set_name()``, ``.skb.set_description()``, ``.skb.mark_as_X()`` or | ||
| ``.skb.mark_as_y()`` used to raise an error, this has been fixed in :pr:`1782` | ||
| by :user:`Jérôme Dockès <jeromedockes>`. | ||
| - Added ``add_tostr`` parameter to :class:`Cleaner` to prevent unintended |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, even though I suggested it I don't like add_tostr as a parameter name for the Cleaner 🤔 we might have to change this before merging.
A user doesn't know what tostr is (since it's a private transformer), so add_tostr does not give enough information.
skrub/_table_vectorizer.py
Outdated
| categorical, or datetime. This step is controlled by the ``add_tostr`` | ||
| parameter. When ``add_tostr=True`` (default), string conversion is applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned above, the name add_tostr will probably be changed
|
Hey @rcap107, I moved the |
|
Hi @PilliSiddharth, I think you forgot to push your latest commit. In the end, I'd like the parameter to be called
|
0f6eb11 to
28b0403
Compare
|
Hey @rcap107, I pushed the updates: renamed the parameter to cast_to_str, updated the test and docstring, and fixed the example output. |
very minor change
|
Hi @PilliSiddharth, thanks a lot for the PR. I made a small change in the order of parameters, swapping An aside, if you are having issues with doctests failing, it's often easier to use the ELLIPSIS ( |
|
Whoops, the order change broke the CI, I'll fix it |
|
Sounds good, glad I could be a part! 😄 |
Add
add_tostrflag toCleanerto controlToStrtransformationSummary
This PR introduces an
add_tostrparameter to theCleanertransformer, allowing users to disable the automaticToStr()conversion applied to non-numeric, non-categorical, and non-datetime columns. This prevents list/object-like columns from being converted to strings unless explicitly requested.What’s changed
add_tostrargument toCleaner(default:True)._get_preprocessorsto includeToStr()only whenadd_tostr=True.Cleanerdocstring (parameter documented; notes updated).test_add_tostrto validate bothadd_tostr=Trueandadd_tostr=False.Motivation
Cleanershould not silently change complex dtypes (e.g.,list,object) to strings. This flag gives users control and mirrors the existing pattern used fornumeric_dtype="float32".Test coverage
skrub/tests/test_cleaner.py::test_add_tostradded.