-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Description & Motivation
In Trainer, when using lightning_precision 16-true
and 16-mixed
, small float values (~ > 1e-7 - 1e-8 ?) can be round up to zero (in input batches, model parameters and its gradients, optimizer hyper-parameters) leading to NaN.
This does not raise any warning or error during training as the training loop ignore NaN.
After training, in inference mode, all output can be NaN, (which makes it a poor model).
After debugging for several days, I found that I had issue in my parameter weights (both from blank initialization or from pre-trained) but also in the very commonly used Adam optimizer (with default eps=1e-8, it was updating all my parameters to NaN in the very first step, even when learning rate was set to 0).
Pitch
I would suggest to add a caveat in the documentation to warn the user that one has to deal with every possible small values in its model, optimizer and input data. In particular, changing default values of optimizer seems required (in my case at least).
Also some default warnings when autocasts to lower-precision lead to over/under -flow would be nice.
Thanks for the nice lib by the way
(edit : added thanks + details about precisions leading to NaNs, bf16-mixed
and bf16-true
were computing fine in my case)
Alternatives
No response
Additional context
No response