WIP: Implementation of Scipy Bootstrap #10871

riven314 · 2022-05-28T12:31:59Z

jakevdp

Thanks, this looks like a good start! Mostly small comments, but the larger change I'd consider is to express the resamplings in terms of vmap rather than scan if possible.

jax/_src/scipy/stats/bootstrap.py

jakevdp · 2022-05-31T17:29:20Z

jax/_src/scipy/stats/bootstrap.py

+
+from typing import NamedTuple
+
+import scipy.stats as osp_stats


Unused import?

jax/_src/scipy/stats/bootstrap.py

jakevdp · 2022-05-31T17:33:19Z

jax/_src/scipy/stats/bootstrap.py

+        idxs = random.randint(rng, shape=(n,), minval=0, maxval=n)
+        # `sample` is a tuple of sample sets, we need to apply same indexing on each sample set
+        resample = jax.tree_map(lambda data: data[..., idxs], sample)
+        next_rng = jax.random.split(rng, 1)[0]


This split should happen before rng is used to generate random numbers, e.g.

rng, next_rng = jax.random.split(rng)

otherwise you may have subtle correlational bugs in the pseudo-random numbers.

As I have changed to use vmap, so now I do the splitting in the vmap argument: jax.random.split(key, n_resamples)

jakevdp · 2022-05-31T17:36:30Z

jax/_src/scipy/stats/bootstrap.py

+    alpha = jnp.broadcast_to(alpha, shape)
+    # QUESTION: is it good practice to use vmap here?
+    # TODO: may need to handle nan
+    # TODO: handle numeric discrepancy against scipy's _percentile_along_axis


vmap is fine here, but do you need multiple vmaps if theta_hat_b has more than two dimensions?

should I use a control flow here to apply multiple vmap if I find that theta_hat_b have more than 2 dimensions?

You can use Python for loops to apply multiple vmap transforms; something like this:

import jax import jax.numpy as jnp def f(x): return x.sum() x = jnp.ones((2, 3, 4, 5)) for i in range(x.ndim - 1): f = jax.vmap(f) print(f(x).shape) # (2, 3, 4)

jax/_src/scipy/stats/bootstrap.py

jakevdp · 2022-05-31T17:40:36Z

jax/_src/scipy/stats/bootstrap.py

+        return next_rng, statistic(*resample)
+
+    # xs is dummy simply for the sake of carrying loops
+    _, theta_hat_b = lax.scan(_resample_and_compute_once, rng, jnp.ones(n))


Is there a reason you chose scan instead of vmap? Because it's a serial operation, scan is going to be far slower than vmap on accelerators.

nice suggestion!
for _bootstrap_resample_and_compute_statistic, vmap does run faster than scan

jakevdp · 2022-05-31T17:41:39Z

jax/_src/scipy/stats/bootstrap.py

+        return idx + 1, statistic(resample)
+
+    # TODO: check if it can handle `statistic` that return multiple scalars
+    _, theta_hat_i = lax.scan(_jackknife_resample_and_compute, 0, jnp.ones(n))


Same comment here regarding scan - we should express this in terms of vmap instead to take advantage of parallelism on accelerators.

I found an issue here @jakevdp

for _jackknife_resample_and_compute_statistic, I weirdly find that vmap runs SLOWER than scan.
On CPU Colab with 10,000 samples, vmap speed is ~210 ms while scan speed is ~110 ms.

Here is the Colab notebook that I did my benchmarking:
https://colab.research.google.com/drive/1abKv-zyI-CZ4BEtXc3oh5Ebx_rcFq0dp?usp=sharing

I am not sure if its because my implementation is inefficient, or I have made some errrors.

Interesting! On CPU scan and friends are generally not too bad, but I suspect if you benchmarked on GPU/TPU you'd find scan to be much slower than vmap.

jakevdp · 2022-06-21T18:18:08Z

jax/_src/scipy/stats/bootstrap.py

+  standard_error: jnp.ndarray
+
+
+def _bootstrap_resample_and_compute_statistic(sample, statistic, n_resamples, key):


This would be more consistent with JAX's typical approach if (1) key were the first argument, and (2) batching were handled via vmap at the call-site, rather than via an argument passed to the function. So the API would be something like this:

def _bootstrap_resample_and_compute_statistic(key, sample, statistic): ...

and rather than calling it like

_bootstrap_resample_and_compute_statistic(sample, statistic, n_resamples, key)

you could instead call it like

keys = random.split(key, n_resamples) vmap(_bootstrap_resample_and_compute_statistic, (0, None, None))(keys, sample, statistic)

The benefits would be (1) more explicit handling of key splitting by the user of the function, and (2) vmap at the outer levels may be somewhat more efficient (I'm not entirely sure on that, but I think it is the case) and (3) it's more maintainable, because it makes use of JAX's composable transforms in the way they're intended to be used, rather than hiding them behind less flexible batch-aware APIs.

jakevdp · 2022-06-21T18:20:17Z

jax/_src/scipy/stats/bootstrap.py

+  miss_first_sample = sample[1:]
+  miss_last_sample = sample[:-1]
+
+  @vmap


Same comment here. Can we define _jackknife_resample_and_compute_statistic so it natively handles a single batch, and then use vmap as appropriate at the call-site?

jakevdp · 2022-06-21T18:24:16Z

jax/_src/scipy/stats/bootstrap.py

+  alpha = jnp.broadcast_to(alpha, shape)
+  vmap_percentile = jnp.percentile
+  for i in range(theta_hat_b.ndim - 1):
+    vmap_percentile = vmap(vmap_percentile)


Rather than vmapping, can we use the axis argument to jnp.percentile?

jakevdp · 2022-06-21T18:24:18Z

jax/_src/scipy/stats/bootstrap.py

+  for i in range(theta_hat_b.ndim - 1):
+    vmap_percentile = vmap(vmap_percentile)
+  percentiles = vmap_percentile(theta_hat_b, alpha)
+  return percentiles[()]


I don't understand the purpose of empty indexing here.

jakevdp · 2022-06-21T18:25:50Z

jax/_src/scipy/stats/bootstrap.py

+  # check alpha is jax array type
+
+
+  if vectorized not in (True, False):


Typically APIs don't require object identity for boolean values.

jakevdp · 2022-06-21T18:31:55Z

jax/_src/scipy/stats/bootstrap.py

+@_wraps(
+  scipy.stats.bootstrap,
+  lax_description=_replace_random_state_by_key_no_batch_jnp_statistic_doc,
+  skip_params=("batch",),


You can use extra_params here to document the key argument.

That said, I'm starting to wonder if this should really be considered a wrapper of scipy.stats.bootstrap, because its API is now substantially different. In numpy's case, we don't provide any wrappers for numpy.random functionality, instead using a different key-based API in jax.random. I'm starting to think that the same treatment would make sense here, because as written jax.scipy.bootstrap must be called with a different signature than scipy.bootstrap.

It also would solve the issue of how to handle irrelevant params like vectorized, and we could write the API in a way that is more typical of JAX library functions (i.e. keep batching orthogonal to the implementation, rather than calling vmap within.

What do you think?

@jakevdp Is there a decision/consensus on whether to adhere to the original API?

wonhyeongseo · 2022-08-23T14:59:28Z

Hello, @riven314!
I'm interested in this PR, may I please address @jakevdp's reviews?
Thank you, and I hope you have a wonderful day!

riven314 · 2022-08-23T15:48:15Z

Hi @wonhyeongseo
Thanks for your interest! Currently I don't have bandwidth on this ticket at least for these 2 months, so feel free to work on it! Otherwise, I can continue working on it whenever I retain my bandwidth.

A few notes about my PR (you can further discuss with @jakevdp to see if those points are valid to address):

the numerical discrepancy between my JAX implementation v.s. scipy's is NOT immaterial, despite ramping up the resampling number. Can't give you a precise statistics (discrepancy of ~0.01?) because I coded it a few months ago, but such discrepancy could probably make unit test to fail
my JAX implementation is a bit slower than scipy's in some settings (CPU).
Scipy bootstrap supports a lot of different scenarios (e.g. multi-statistics, one-set/ multi-set/ paired samples, different bootstrap methods). For now I am not confident if my implementation works as expected under various scenarios.

Look forward to your contribution to the ticket.

carlosgmartin · 2023-02-02T02:03:52Z

@wonhyeongseo Are you still interested in tackling this?

wonhyeongseo · 2023-02-03T09:12:04Z

Hello @carlosgmartin ! Not at the moment because I don't know how. Would love to see your implementation! 😊

carlosgmartin · 2023-02-03T18:49:35Z

@wonhyeongseo Thanks for letting us know.

@riven314 Would you like to continue working on your PR, or would you prefer someone else take over?

riven314 · 2023-02-03T18:52:10Z

@carlosgmartin
Hi! I don't foresee I have bandwidth in the short run so I would appreciate if someone is willing to take over.
I am glad to explain my code if any help is needed!

jakevdp · 2023-11-03T22:14:04Z

Thanks for working on this – I think given the discussion in https://jax.readthedocs.io/en/latest/jep/12049-type-annotations.html this would now be considered out-of-scope for JAX itself. Sorry we weren't able to merge your contribution!

prelim impl of bootstrap

6a2d66b

riven314 mentioned this pull request May 28, 2022

Implement jax.scipy.stats.bootstrap #10375

Open

riven314 changed the title ~~Implementation of Scipy Bootstrap~~ WIP: Implementation of Scipy Bootstrap May 28, 2022

froystig assigned jakevdp May 31, 2022

jakevdp reviewed May 31, 2022

View reviewed changes

riven314 added 3 commits June 2, 2022 21:23

address review comments from jake

ee8d6b1

remove jax.random.split within _resample_and_compute_once

3f3bf09

checkpoint

35f58db

jakevdp requested changes Jun 21, 2022

View reviewed changes

jakevdp closed this Nov 3, 2023


		from typing import NamedTuple

		import scipy.stats as osp_stats

		standard_error: jnp.ndarray


		def _bootstrap_resample_and_compute_statistic(sample, statistic, n_resamples, key):

		# check alpha is jax array type


		if vectorized not in (True, False):

WIP: Implementation of Scipy Bootstrap #10871

WIP: Implementation of Scipy Bootstrap #10871

Uh oh!

Conversation

riven314 commented May 28, 2022

Uh oh!

jakevdp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

riven314 Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

riven314 Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

riven314 Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakevdp Jun 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wonhyeongseo commented Aug 23, 2022

Uh oh!

riven314 commented Aug 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlosgmartin commented Feb 2, 2023

Uh oh!

wonhyeongseo commented Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlosgmartin commented Feb 3, 2023

Uh oh!

riven314 commented Feb 3, 2023

Uh oh!

jakevdp commented Nov 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

riven314 Jun 2, 2022 •

edited

Loading

riven314 Jun 2, 2022 •

edited

Loading

riven314 Jun 2, 2022 •

edited

Loading

jakevdp Jun 21, 2022 •

edited

Loading

riven314 commented Aug 23, 2022 •

edited

Loading

wonhyeongseo commented Feb 3, 2023 •

edited

Loading