Skip to content

Conversation

@thomass-dev
Copy link
Contributor

@thomass-dev thomass-dev commented Nov 27, 2025

Only for reproducibility: summarize_dataframe now accepts a new parameter seed, used to set the sampling seed (sbd.sample) when computing the top associations.

I wonder if we should reuse SKB_SUBSAMPLING_SEED from skrub.DataOp.skb.subsample(),

@thomass-dev thomass-dev changed the title feat: Use sampling seed in summarize_dataframe for reproducibility FEAT - Use sampling seed in summarize_dataframe for reproducibility Nov 27, 2025
@glemaitre glemaitre self-requested a review December 8, 2025 13:04
@glemaitre glemaitre removed their request for review December 8, 2025 21:22
Copy link
Member

@jeromedockes jeromedockes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @thomass-dev @glemaitre !

@rcap107 rcap107 merged commit 3ad86df into skrub-data:main Dec 9, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants