-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
It is known that adaptive stochastic optimizers like Adam, Adagrad, RMSprop can indeed fail to converge. Moreover the test error of the trained models can be larger when compared to the models trained by SGD even though if they attain lower training error than with SGD - in other words they can overfit. See http://papers.nips.cc/paper/7003-the-marginal-value-of-adaptive-gradient-methods-in-machine-learning.pdf .
Recently, ADABOUND was proposed (https://openreview.net/pdf?id=Bkg3g2R9FX) to overcome the above mentioned problems of the adaptive optimizers. The algorithm combines the best of both worlds : (a) Makes fast progress initially like the adaptive methods, and (b) attains similar or better test accuracy to that of SGD.
Would this algorithm be welcome in JAX? If so, I would be interested to implement it.