Skip to content

Conversation

@andyfaff
Copy link
Owner

@stsievert, @jaimefrio I'm going to write a blurb on why I think Optimizer classes should be introduced. From discussion on scipy-dev this morning it's clear that a case needs to be written, and hopefully you may be able to contribute. This is a location to put that text (not intended for eventual merge into master). Feel free to put requests against this feature branch in PEP/1-Optimizer.md. Not wedding to MD, could be rst, etc. If you think it could be done in a better way (e.g. google docs) let me know.

@stsievert
Copy link
Collaborator

A useful link: scipy#6326, which details a similar ODE refactor.

@stsievert
Copy link
Collaborator

I think we're almost ready to start writing. The outline is looking pretty complete.

I think we should highlight that minimize is (effectively) trying to be a class. I fleshed that out more and brought it to the top. Your most recent commit made me realize this is an issue.

I think we should follow the format of Python PEPs, https://github.com/python/peps. We can leave the outline in Markdown.

@andyfaff
Copy link
Owner Author

Some are literal black boxes and implemented in Fortran/C.

I don't think we should say that we're going to extract the Fortran/C code back into Python. That's really well tested over many years, and would be resisted by the community.

LBFGSB has Fortran, but it returns every step, meaning we can have an Optimizer object. If there are minimizers/solvers that run to completion (leastsq?) within C/Fortran, then I'm not sure that we could translate those easily to Optimizer.

@andyfaff
Copy link
Owner Author

Yes, leastsq is totally Fortran contained - little chance of making an Optimizer out of that, unless it's possible to call Fortran step by step, but that would lead to overhead (possibly unacceptable for many).

BTW I did a little bit of profiling for NelderMead. minimize rosen for starting [1, 2, 3, 4]
fmin is around 24ms
NelderMead is around 29ms.
So the class based approach is around 20% slower. Not sure where that comes from. cython might recover that.

@andyfaff
Copy link
Owner Author

  • minimize is trying to be a class
    • method: should be subclasses

Playing devils advocate - the original solvers were fmin, f_min_bfgs, etc. Then they were collected together in minimize to introduce uniformity, which occurred to a degree. Now we're proposing that they separate again to individual classes. You can see how people can be suspicious of yet another re-write.

@stsievert
Copy link
Collaborator

they were collected together in minimize to introduce uniformity, which occurred to a degree

Classes introduce another level of uniformity, correct? Subclasses can inherit from the base class, and with that comes all the parameters and functions. And overridden functions should return values of the same type/shape/etc (though there's no way to really check).

Here are issues that have arisen because of special cases:

@stsievert
Copy link
Collaborator

Can you give the proposal a read over? I added a section on goals and want to make sure we're on the same page.

@andyfaff
Copy link
Owner Author

andyfaff commented Feb 18, 2018

One issue is the point

hides all details. Some are literal black boxes and implemented in Fortran/C.

I don't think we want to be pulling code out of Fortran and C. As it's written it gives the impression that we're going to do all that calculation in Python. That's an enormous task. The core of the proposal is that pretty much all optimisers have a loop, and that loop can be naturally run as an iterator . So for all the optimizers that have their loop in Python, the loop is extracted and put into __next__. For example LBFGSB has a loop, in that loop there is a call to Fortran. That Fortran call will stay as is, but the loop that calls it will now form the core of the __next__ method.

However, for something like leastsq the entire thing is in external code (minpack), calling back into python for func evaluation. At this stage we don't want to be delving into minpack and changing it. However, with the design we have it will allow someone at a future stage to do that.

It's not obvious what is meant by "black box". Replacing current loops with an iterator is not going to cure that.

@andyfaff
Copy link
Owner Author

I have another idea, which might sway the argument for a proportion of the community. If we put Optimizer, Function and the Optimizer subclass into Cython it might allow the use of a LowLevelCallable. I know that this provides a big speedup for things like integration where the entire operation then happens in C, it may be that it we could get big speed improvements if everything can happen in Cython, without having to call back into Python.
However, I'm not super familiar with Lowlevelcallable, it'll need some investigating.

@andyfaff
Copy link
Owner Author

bug reports, personal experience and anecdotal evidence.

We need hard evidence here. @jaimefrio, do you reckon you'd be able to chip in here?

@andyfaff
Copy link
Owner Author

What is meant by a "minimal class interface"?

@andyfaff
Copy link
Owner Author

Not sure that multiprocessing is immediately introducible as a result of this PR

@stsievert
Copy link
Collaborator

stsievert commented Feb 18, 2018

Thanks for your comments, this is starting to get fleshed out. I've made some edits; could you take another look?

Not sure that multiprocessing is immediately introducible as a result of this PR

Maybe we should mention multiprocessing in a "possible future work" section or "possible third party work this enables"? I can point to certain async optimization algorithms that require multiprocessing. I'm fine leaving this out.

Optimizer, Function and the Optimizer subclass into Cython

You seem to be on top of this; I'll let you handle it.

bug reports, personal experience and anecdotal evidence. ... We need hard evidence here

We should also ask the managers of other libraries this question. Let's finish a draft of the report then ask them. I've added a note at the top of the draft about what to ask them about.

It's not obvious what is meant by "black box". Replacing current loops with an iterator is not going to cure that.

What if I don't want my optimizer to do a line search, or what if I want to change some of the hyperparameters in the line search? I think the methods that use a line search should have a function Optimizer._line_search. This would allow the advanced user to modify it or any hyperparameters.

What is meant by a "minimal class interface"?

One that wasn't specific to minimize or even SciPy I guess. I think I was imagining some class Optimizer with 4 subclasses, one for each of minimize, minimize_scalar, root and linprog. I'd imagine this class has a couple basic functions (__next__, results, converged, ...) that all raise a NotImplementedError.

Then, each of the methods in those functions could subclass from their parents (e.g., Optimize is a parent of Minimize is a parent of NealderMead). This could be used in solve_ivp I guess too.

I'm not saying this proposal should implement all 4 subclasses of Optimizer, but I think we should lay the groundwork for any future work.

@stsievert
Copy link
Collaborator

I need to rework the outline. In the second section, we list briefly list the features of this rewrite. The rest of the paper should be expand upon each of those features.

@andyfaff
Copy link
Owner Author

I've been pondering a bit more. I can understand the reluctance to have 'another API', and whilst that's probably the main issue you're concerned with, I think for me by far the main attraction is that the changes will make maintenance and introducing new features a lot easier. I think we should give the latter more prominence.

One of my other thoughts, and a lesson I learnt with writing DifferentialEvolutionSolver, is that it could be a good idea to keep the Optimizer/Function classes private for a couple of releases. They would still be the core functionality for the current fmin, etc, but this approach has the advantage of:

  1. being able to tweak implementation of the classes before they become public and it's too late to change anything.
  2. being able to tweak implementation if we discover a problem whilst implementing other scalar minimizers.
  3. Give us time to convert the other scalar minimizers before a big release.

@stsievert
Copy link
Collaborator

keep the Optimizer/Function classes private for a couple of releases

I also think that's a good idea. I hadn't considered it but am on board.

I think we should give the latter more prominence.

I've moved the "other API" section to the further down.

luzpaz and others added 2 commits February 22, 2018 22:08
andyfaff and others added 3 commits February 23, 2018 07:51
TST: doctest changes for NumPy 1.14.1
* DOC: Example for integrate.romb

* code formating fix

* fix

* Update quadrature.py
ENH: implemented Owen's T function
@stsievert
Copy link
Collaborator

I think we should follow the outline of an actual PEP. This is what I plan to do next.

rgommers and others added 8 commits March 6, 2018 23:34
DOC: Better, shared docstrings in ndimage
MAINT: Move spline interpolation code to spline.c
* CircleCI: Share data between jobs

* Explicitly add deploy key fingerprint.

* Try new deploy key

* CI: re-enable circleci master branch deploy filter
@andyfaff
Copy link
Owner Author

andyfaff commented Mar 8, 2018

As with anything I'd like to see the evidence for that before concurring. For example, I thought Cython would speed up this PR, but there were no significant gain when I experimented with that. I hope that the real gain would be to use LowLevelCallable from Cython to reduce overhead from Python function calls.

@andyfaff
Copy link
Owner Author

andyfaff commented Mar 8, 2018

@rgommers, I made the formatting changes you suggested, such as the header table. I think we're happy with the PEP to be placed in a scipy/* repo at this point.

@stsievert
Copy link
Collaborator

@rgommers I'm also satisfied with the proposal (though it's not perfect and could use improvement).

I think SEP or SPEP are the best acronyms. Maybe SciPEP?

person142 and others added 2 commits March 8, 2018 21:12
Remove commented out function declarations and fix whitespace in
`cephes.h` and remove accidental semicolons in `cephes.pxd`.
@rgommers
Copy link

rgommers commented Mar 9, 2018

Thanks, looks good. I'll have a look at creating that repo now.

@rgommers
Copy link

rgommers commented Mar 9, 2018

Okay either https://github.com/numpy/neps is still in flux, or we are keeping the sources for NEPs within the numpy repo itself after all, I can't remember. If the answer is the latter, let's put SciPEPs at doc/source/scipeps in the SciPy repo.

amanp10 and others added 7 commits March 9, 2018 12:26
ENH: scipy.linalg.eigsh cannot get all eigenvalues
* MAINT: fftpack: Use correct comment character in fftpack.pyf.

f2py interface files don't use C-style comments.  This fixes the following
message that occurs in the build log:

    f2py: scipy/fftpack/fftpack.pyf
    Reading fortran codes...
        Reading file 'scipy/fftpack/fftpack.pyf' (format:free)
    Line #86 in scipy/fftpack/fftpack.pyf:"       /* Single precision version */"
        crackline:2: No pattern for line

* MAINT: optimize: Add missing 'end interface' statement in nnls.pyf.

This fixes the following message that occurs in the build log:

    f2py: scipy/optimize/nnls/nnls.pyf
    Reading fortran codes...
        Reading file 'scipy/optimize/nnls/nnls.pyf' (format:free)
    crackline: groupcounter=1 groupname={0: '', 1: 'python module', 2: 'interface', 3: 'subroutine'}
    crackline: Mismatch of blocks encountered. Trying to fix it by assuming "end" statement.
…chet

DOC: stats: invweibull is also known as Frechet or type II extreme value.
* Remove numerical arguments from convolve2d

* Reword error message
MAINT: cleanup
andyfaff pushed a commit that referenced this pull request May 14, 2020
andyfaff pushed a commit that referenced this pull request Jan 5, 2022
* Revert to branch base commit for future rebasing.
Re-add changes.
Reformat to less than 79 chars.

* fix pycodestyle warnings

* Exclude vscode and venv folder from git

* Move levy stable to separate module.
Move fft characteristic function to separate module.
Remove inline cotes numbers and use those from integrate module.
Implement suggestions by ragibson to improve accuracy for DNI calc.
Use a global epsabs for all quad integrals, set to 1e-12.
Improve piecewise accuracy by shrinking epsabs.
TODO: extract greater sample space from Nolan's stablec.

* Stable distribution improvements.

Introduce epsabs parameter for integration.
Introduce larger sample space for pdfs based on percentiles.
Adapt pdf unit tests and check relative errors.
Fallback to normal pdf if alpha==2.

* Address primary numerical difficulties in Nolan's levy_stable integration

Add integrand support and tail hints to improve QUADPACK accuracy
Add some numerical difficulties safeguards from Nolan's STABLE
Re-express Nolan's g function to improve numerical stability
Eliminate dependence on np.complex128 exponentiation
Refactor Nolan's functions to reduce code repetition and improve performance

* New cdf test data. Switch to relative err.
Add some analytical pdf results.
Temporarily re-introduce old cdf piecewise as has less problem points.

* Further refine levy_stable piecewise calculations

Change epsabs to epsrel in piecewise quadrature (with epsabs=0 to match STABLE)
Change epsrel in piecewise quadrature (to avoid default epsrel=1.49e-08)
Remove CDF exponent height samples (they turned out to be unnecessary and seemed to hurt accuracy)
Significantly reduce (improve) tolerances in the piecewise tests and fix incorrect usage of assert_almost_equal

* Add tests for Nolan's piecewise parameter rounding

This revealed some issues, so _nolan_round_difficult_input was tweaked
The rounding tolerances also had to be increased

* Revert "Add simple tests for Nolan's piecewise parameter rounding"

* Filter round off integration warnings.
Update integrands for pdf/cdf to ignore anything outside [0,inf).

* Move warning supression to tests.

* Use full_output to supress IntegrationWarnings

* Fix some linting issues.
Support all greek letters in unicode-check.

* Fix missed linting issues

* Multiple parameterisations (#6)

* initial support for S0 and S1 parameterisations
* Support fitstart with mult parameterization.
* Add option to switch between S0 and S1.
* Add KS test for rvs with diff params.
* Correct typo in rvs when alpha == 1
* fix pdf / cdf translations.
* Support mixed alpha/beta params in pdf/cdf
* Move sample files into own folder.
* Trim rvs test pairs.
* Add location / scale tests.
* Regroup options into one place.
* Filter point in dni test failing on some platforms.

* Rewrite nolan routines in C.
Remove reliance on floating point architectures.

* Use c free, not PyMem_free.
(fixes single seg fault on travis Python 3.7 Debug)

* Fix issue with returning NaN in analytic case

Both the CDF and PDF would return NaN outside the distribution support
when alpha == 0.5 and beta == +-1.0.

Note that both cases use an analytic result from a Levy distribution
with mu=0.0, so the support is [0.0, infty) in the S0 parameterization.

* Add simple cdf/pdf test past distribution support

* Incorporate steppi's review comments, with minor tweaks

* TST: Refactor test_rvs for levy_stable

* TST: Move loading of stablec data into fixtures

* TST: Add reg, slow, xslow cases for stable pdf

* TST: Add reg, slow, xslow tests for stable cdf

* TST: Clean up formatting of test cases

np.isin wrapper npisin has been removed since numpy 1.8.2 is no longer
supported

* TST: Remove extraneous comma (to be squashed)

* TST: Move loc scale sample data to fixture

* TST: parametrize levy_stable test_stats

* TST: Parametrize levy_stable alpha 1 beta nonzero

* TST: parametrize test levy_stable outside support

* TST: parametrize test levy_stable loc scale

* TST: Add todo about filtered out test cases

* MAINT: Remove extraneous import

* MAINT: Resolve linting issues

* MAINT: Remove trailing whitespace in Nolan PDF test

Interestingly, the CI didn't give the right line number here, so we'll see if this resolves the issue.

* Resolve fft_interpolation_kind inconsistency

The fft pdf calculation uses scipy.interpolate.interp1d for interpolating,
whereas the fft cdf calculation uses scipy.interpolate's
InterpolatedUnivarateSpline. The latter has a method integral for
calculating the integral of the spline between two points which is
used in the cdf calculation. InterpolatedUnivariateSpline does not
have the kind parameter of interp1d which can take integer or string
arguments, but as a corresponding parameter k which only takes integer
arguments and specifies the degree of the spline. The existing
behavior is to not allow the spline degree to be specified for the fft
cdf calculation. This commit changes
both the pdf and cdf to use InterpolatedUnivariateSpline, and uses an
analog of fft_interpolation_kind in both cases. It is now called
fft_interpolation_degree to emphasize that it is not equivalent to the
kind parameter of interp1d. Separate class variables have also been
created for the pdf and cdf interpolation parameters.

* Remove greek letters as allowed characters

If this change is desired it should be made in it's own PR.

* DOC: Make small edits to some comments

* DOC: Revise docstring

* DOC: Make revisions to docstring

* TST: Add failing test for fit and rvs

* TST: Add random_state to rvs in test

* TST: Change obsolete variable name

* Loosen tolerance

* DOC: Update reason for fail to mention active PR

* MAINT: Remove duplicate import in test_distributions

It looks like this import was added recently in upstream, so the one added in this PR became a duplicate.
This should hopefully resolve the one remaining linting issue.

* MAINT: Resolve line length linting issue in _levy_stable_distn

This is the only linting issue that appears locally for me in this file
No relevant linting issues appear in test_distributions either, so this one hopefully passes the CI check

* MAINT: Pull unicode-check version from upstream

* MAINT: Appease mypy regarding array construction

* MAINT: Add levyst extension to mypy.ini ignore list

* MAINT: Get rid of unnecessary asanyarray

* MAINT: Change conditions to avoid divide by zero

* MAINT: Consolidate pdf_from_cf_with_fft

* MAINT: Consolidate levy_stable code into __init__

Co-authored-by: ragibson <[email protected]>
Co-authored-by: Albert Steppi <[email protected]>
Co-authored-by: steppi <[email protected]>
Co-authored-by: Ryan Gibson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.