Allow disabling bias for `LayerNorm` #101683

janEbert · 2023-05-17T09:30:41Z

Only relevant if elementwise_affine=True.

pytorch-bot · 2023-05-17T09:30:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/101683

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9ff21d5:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/nn/modules/normalization.py

jbschlosser · 2023-05-19T19:35:31Z

Hey @janEbert, thanks for the PR - is there an associated issue for this?

mikaylagawarecki · 2023-05-19T20:12:29Z

For my understanding, is this change related to the porting of RMSNorm from this comment?

janEbert · 2023-05-22T09:31:04Z

Hey @jbschlosser, I tried querying issues and PRs matching "LayerNorm without bias" and similar, but didn't find anything. I haven't opened an issue for this but can do it if it makes administration easier.
Also if you have a tip for searching in very large open-source projects like PyTorch, I'd really appreciate it for the future!

The reason to implement this PR is T5-style models as discussed in @mikaylagawarecki's linked issue (which I didn't find in my search). The PaLM paper also mentioned more stable scaling when disabling LayerNorm bias for large models.

@albanD mentioned:

yes bias is not great as it is usually just the bias weight but here it also means the centering is also removed (you don't remove the average bias). Do you think rms_only=True would be a better name?

Should this be discussed? I personally think bias is clear enough, considering the consistency with other PyTorch layers and that centering in the statistical sense is usually "mean".

janEbert · 2023-07-05T14:32:37Z

Hey, any new opinions on this? I'd be really happy to see this merged so that the PyTorch Transformers API becomes more flexible for scaling up. :)

mikaylagawarecki · 2023-07-14T19:43:19Z

@janEbert Apologies for the delay, so my understanding (which perhaps you were getting at) is that

LayerNorm is given by

$y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta$
where the $\epsilon$ is added for numerical stability

This PR is doing LayerNorm(elementwise_affine=True, bias=False) is

$y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma$

and RMSNorm is given by

$y = \frac{x}{ RMS(x)} * \gamma$ (with maybe an optional $+ \beta$ per this comment)

Where

$Var(x) = \frac{1}{n}\sum\limits_{i=1}^{n} (x_i - E[x])^2$

and

$RMS(x) = \sqrt{\frac{1}{n}\sum\limits_{i=1}^{n} (x_i)^2}$ ($\epsilon$ could also be used for numerical stability in this calculation)

so RMSNorm defers from LayerNorm on 3 counts

don't subtract expectation from numerator
use RMS(x) rather than $\sqrt{Var(x) + \epsilon}$ in the denominator
(perhaps) don't learn an elementwise affine bias

And this change is completely separate from RMSNorm. In conclusion, we are okay with adding a bias argument here to disable the affine bias for consistency with other modules (e.g. Bilinear, Linear) :)

mikaylagawarecki · 2023-07-14T19:43:53Z

@pytorchbot rebase

pytorchmergebot · 2023-07-14T19:46:51Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2023-07-14T19:47:02Z

Successfully rebased layernorm_nobias onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout layernorm_nobias && git pull --rebase)

janEbert · 2023-07-16T10:29:52Z

Thank you so much @mikaylagawarecki, that's an amazing summary that clears up any misunderstandings!

mikaylagawarecki · 2023-07-17T15:16:27Z

@pytorchbot merge

pytorchmergebot · 2023-07-17T15:18:31Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

As used by T5 and PaLM, citing "increased training stability for large models" (https://arxiv.org/abs/2204.02311). Depends on #101683, which allows disabling bias for `LayerNorm`s. Marked as draft due to this. Pull Request resolved: #101687 Approved by: https://github.com/mikaylagawarecki

janEbert requested review from albanD, jbschlosser and mikaylagawarecki as code owners May 17, 2023 09:30

pytorchbot added the open source label May 17, 2023

janEbert mentioned this pull request May 17, 2023

Allow disabling bias for Transformer #101687

Closed

awgu reviewed May 17, 2023

View reviewed changes

torch/nn/modules/normalization.py Show resolved Hide resolved

drisspg added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 17, 2023

jbschlosser assigned albanD May 19, 2023

mikaylagawarecki self-assigned this Jul 7, 2023

Allow disabling bias for LayerNorm

9ff21d5

pytorchmergebot force-pushed the layernorm_nobias branch from 729d8c6 to 9ff21d5 Compare July 14, 2023 19:47

mikaylagawarecki approved these changes Jul 14, 2023

View reviewed changes

mikaylagawarecki added release notes: nn release notes category topic: improvements topic category labels Jul 14, 2023

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 17, 2023

pytorchmergebot added the merging label Jul 17, 2023

pytorchmergebot added Merged and removed merging labels Jul 17, 2023

pytorchmergebot closed this in 4fc47b4 Jul 17, 2023

albanD mentioned this pull request Aug 28, 2023

LayerNorm with bias=False doesn't work #108048

Closed

flxst mentioned this pull request Mar 8, 2024

RMS norm implementation Modalities/modalities#67

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow disabling bias for `LayerNorm` #101683

Allow disabling bias for `LayerNorm` #101683

Uh oh!

janEbert commented May 17, 2023

Uh oh!

pytorch-bot bot commented May 17, 2023 •

edited

Loading

Uh oh!

Uh oh!

jbschlosser commented May 19, 2023

Uh oh!

mikaylagawarecki commented May 19, 2023

Uh oh!

janEbert commented May 22, 2023

Uh oh!

janEbert commented Jul 5, 2023

Uh oh!

mikaylagawarecki commented Jul 14, 2023 •

edited

Loading

Uh oh!

mikaylagawarecki commented Jul 14, 2023

Uh oh!

pytorchmergebot commented Jul 14, 2023

Uh oh!

pytorchmergebot commented Jul 14, 2023

Uh oh!

janEbert commented Jul 16, 2023

Uh oh!

mikaylagawarecki commented Jul 17, 2023

Uh oh!

pytorchmergebot commented Jul 17, 2023

Uh oh!

Uh oh!

Allow disabling bias for LayerNorm #101683

Allow disabling bias for LayerNorm #101683

Uh oh!

Conversation

janEbert commented May 17, 2023

Uh oh!

pytorch-bot bot commented May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/101683

✅ No Failures

Uh oh!

Uh oh!

jbschlosser commented May 19, 2023

Uh oh!

mikaylagawarecki commented May 19, 2023

Uh oh!

janEbert commented May 22, 2023

Uh oh!

janEbert commented Jul 5, 2023

Uh oh!

mikaylagawarecki commented Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikaylagawarecki commented Jul 14, 2023

Uh oh!

pytorchmergebot commented Jul 14, 2023

Uh oh!

pytorchmergebot commented Jul 14, 2023

Uh oh!

janEbert commented Jul 16, 2023

Uh oh!

mikaylagawarecki commented Jul 17, 2023

Uh oh!

pytorchmergebot commented Jul 17, 2023

Merge started

Uh oh!

Uh oh!

Allow disabling bias for `LayerNorm` #101683

Allow disabling bias for `LayerNorm` #101683

pytorch-bot bot commented May 17, 2023 •

edited

Loading

mikaylagawarecki commented Jul 14, 2023 •

edited

Loading