Skipping associations compute if the number of cols > specified #1304

victoris93 · 2025-04-11T12:09:07Z

Closes #1287.

I made changes to skip assocations computation. Haven't modified to the tests yet, it's coming right up.
It would make sense to remove the Associations tab from the html table altogether.

victoris93 · 2025-04-11T16:02:20Z

Final changes:

max_association_columns = 30
If associations are not computed (n_cols > max_association_columns), the Associations tab is excluded from the report

Vincent-Maladiere

Hi @victoris93, thank you for this PR!

@jeromedockes, we probably should have a single parameter to control both max_plot_columns and max_association_columns, WDYT? We can do it in another PR, though.

victoris93 · 2025-04-14T09:41:01Z

@Vincent-Maladiere we briefly touched upon this in our last conversation w/ @jeromedockes; imo it's an elegant simplification but from the user standpoint it's a big assumption that they would always prefer to avoid both the plotting and the association computation and for the same n. of columns. But totally your call ofc.

victoris93 · 2025-04-14T09:42:43Z

Though if you decide to implement it later, would be down to submit another PR

jeromedockes · 2025-04-14T10:05:46Z

I don't have a strong opinion. @victoris93 is right that it's an assumption that a user would want to tie the 2 together. OTOH my guess would be that most users will leave those parameters to the default value all the time 🤔

Vincent-Maladiere

I agree that very few users will actually tweak these parameters. That said, having both parameters could make sense for users who, for example, have many correlated columns but would rather not display the distributions of all of them. Therefore we can keep things as they are :)

Vincent-Maladiere · 2025-04-14T11:59:14Z

skrub/_reporting/_table_report.py

        self.dataframe = dataframe
        self.verbose = verbose
        self.max_plot_columns = max_plot_columns
+        self.max_association_columns = max_association_columns


Nitpick: we could handle None values (below) when setting the class attributes (above) to simplify the flow a bit.

Vincent-Maladiere · 2025-04-14T12:00:43Z

skrub/_reporting/_table_report.py

+            with_plots=with_plots,
+            with_associations=with_associations,


We could simplify the lines above like:

Suggested change

with_plots=with_plots,

with_associations=with_associations,

with_plots=self.max_plot_columns >= self.n_columns,

with_associations=self.max_association_columns >= self.n_columns,

oh that's right. There're some lines to delete just above, so will commit separately

victoris93 · 2025-04-14T12:20:25Z

skrub/_reporting/_table_report.py

+        self.max_association_columns = max_association_columns
+        self.n_columns = sbd.shape(self.dataframe)[1]
+
+        if self.max_plot_columns is None:


@Vincent-Maladiere Not sure I understand the comment about None value handling. Would you like None values to be handled differently than what I have here?

It's almost nothing, having something like:

self.max_plot_columns = self.n_columns if max_plot_columns is None else max_plot_columns

Ah to follow @jeromedockes suggestion, we could have self.max_plot_columns = max_plot_columns, and then derive self.max_plot_columns_ using your logic. self.max_plot_columns_ would be the one actually used. I won't comment further as this gets too nitpicky

jeromedockes · 2025-04-14T12:19:57Z

skrub/_reporting/_table_report.py

+        self.n_columns = sbd.shape(self.dataframe)[1]
+
+        if self.max_plot_columns is None:
+            self.max_plot_columns = self.n_columns
+        if self.max_association_columns is None:
+            self.max_association_columns = self.n_columns


I would rather still store them as None so it is easy to check if the report was created with max_plot_columns set to None or to the number of columns in the dataframe

jeromedockes · 2025-04-14T12:25:16Z

skrub/_reporting/_table_report.py

-        return summarize_dataframe(
-            self.dataframe, with_plots=True, title=self.title, **self._summary_kwargs
-        )
+    def _lightify_summary(self):


the reason why we had 2 cached properties is that the json does not need the plots so we might want either a full summary or a plot-less one, and both would be cached.

Now if we don't have just 2 kinds of report but several parameters we should just give up on that it wasn't really necessary, and more a relic from an early version where the tablereport was also able to produce output in text format.

We can have just

@functools.cached_property def _summary(self): n_columns = sbd.shape(self.dataframe)[1] with_plots = self.max_plot_columns is None or self.max_plot_columns >= n_columns with_associations = ... return summarize_dataframe(self.dataframe, with_plots=with_plots, ...)

(and no more _lightify_summary nor _get_summary )

jeromedockes · 2025-04-14T12:35:06Z

I think the review comments (at least mine) are getting unclear 😅 sorry about that @victoris93 . As the PR is working fine I suggest we merge it and any further changes can be done in another PR, WDYT?

@victoris93 was there anything you still wanted to change before I merge it?

victoris93 · 2025-04-14T12:37:05Z

@jeromedockes np pb, I think I got it. Just a sec, the last commit should address the comments, almost there

jeromedockes · 2025-04-14T12:39:40Z

ok thanks! If you do get rid of _get_summary() you will need to replace _get_summary() with _summary everywhere in _reporting/tests/test_table_report.py

…max_association_columns atts as initially set.

victoris93 · 2025-04-14T12:58:27Z

@jeromedockes @Vincent-Maladiere should be good now I think

jeromedockes

thanks a lot! LGTM with the suggested changes

skrub/_reporting/_table_report.py

Co-authored-by: Jérôme Dockès <[email protected]>

jeromedockes

great!! thanks so much @victoris93 this is really very useful 🚀 as the cramerv computation in particular can get very slow and take a lot of memory for a large number of columns

skrub/_reporting/_table_report.py

formatting

victoris93 · 2025-04-14T14:07:51Z

oops I wanted to run the pre-commit checks before you merged but it was too late, sry about that. I hope it's ok.

jeromedockes · 2025-04-14T14:12:25Z

oops I wanted to run the pre-commit checks before you merged but it was too late, sry about that. I hope it's ok.

there was just an extra blank line but I removed it before merging; the pre-commit checks are passing without complaints :) thanks again, and whenever you have time don't hesitate to ping us if you want help in picking the next thing to work on!

GaelVaroquaux · 2025-04-14T15:21:07Z

Thanks Victoria!!

victoris93 and others added 10 commits April 11, 2025 14:07

skipping association compute in TableReport

cb39a5f

rm report._summary_with_plots

c13816c

updated docstring for TableReport

32a588d

added an entry to CHANGES.rst

5cdd3f2

modified tests

d91589c

removed associations tab and added tests for patch displays

24f6087

Merge branch 'main' into no_assoc_table_report

3ee135a

fixed _summarize.py

109adee

Merge branch 'skrub-data:main' into no_assoc_table_report

dbd04f2

fixed patches and testing

5dee34b

victoris93 changed the title ~~Skipping associations compute if the number of cols > default~~ Skipping associations compute if the number of cols > specified Apr 11, 2025

victoris93 marked this pull request as ready for review April 11, 2025 15:55

Vincent-Maladiere reviewed Apr 14, 2025

View reviewed changes

victoris93 commented Apr 14, 2025

View reviewed changes

simplify _lightify_summary()

25291fd

jeromedockes reviewed Apr 14, 2025

View reviewed changes

rm _get_summary(), keep TableReport.max_plot_columns and TableReport.…

d8d4e43

…max_association_columns atts as initially set.

jeromedockes reviewed Apr 14, 2025

View reviewed changes

skrub/_reporting/_table_report.py Outdated Show resolved Hide resolved

skrub/_reporting/_table_report.py Show resolved Hide resolved

jeromedockes reviewed Apr 14, 2025

View reviewed changes

skrub/_reporting/_table_report.py Show resolved Hide resolved

victoris93 and others added 2 commits April 14, 2025 15:44

separate vars for with_plots, with_associations

ce8cee3

Co-authored-by: Jérôme Dockès <[email protected]>

Update skrub/_reporting/_table_report.py

230109c

Co-authored-by: Jérôme Dockès <[email protected]>

Update skrub/_reporting/_table_report.py

ebeb755

Co-authored-by: Jérôme Dockès <[email protected]>

jeromedockes approved these changes Apr 14, 2025

View reviewed changes

jeromedockes reviewed Apr 14, 2025

View reviewed changes

skrub/_reporting/_table_report.py Show resolved Hide resolved

Update skrub/_reporting/_table_report.py

8ef0816

formatting

jeromedockes enabled auto-merge (squash) April 14, 2025 14:01

jeromedockes merged commit 66329b0 into skrub-data:main Apr 14, 2025
21 of 22 checks passed

jeromedockes added the TableReport anything related to the TableReport label Apr 14, 2025

Skipping associations compute if the number of cols > specified #1304

Skipping associations compute if the number of cols > specified #1304

Uh oh!

Conversation

victoris93 commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

victoris93 commented Apr 11, 2025

Uh oh!

Vincent-Maladiere left a comment

Choose a reason for hiding this comment

Uh oh!

victoris93 commented Apr 14, 2025

Uh oh!

victoris93 commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeromedockes commented Apr 14, 2025

Uh oh!

Vincent-Maladiere left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vincent-Maladiere Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

Vincent-Maladiere Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

victoris93 Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

victoris93 Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vincent-Maladiere Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

Vincent-Maladiere Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

jeromedockes Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

jeromedockes Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

jeromedockes commented Apr 14, 2025

Uh oh!

victoris93 commented Apr 14, 2025

Uh oh!

jeromedockes commented Apr 14, 2025

Uh oh!

victoris93 commented Apr 14, 2025

Uh oh!

jeromedockes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeromedockes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

victoris93 commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeromedockes commented Apr 14, 2025

Uh oh!

GaelVaroquaux commented Apr 14, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

victoris93 commented Apr 11, 2025 •

edited

Loading

victoris93 commented Apr 14, 2025 •

edited

Loading

Vincent-Maladiere left a comment •

edited

Loading

victoris93 Apr 14, 2025 •

edited

Loading

victoris93 commented Apr 14, 2025 •

edited

Loading