-
Notifications
You must be signed in to change notification settings - Fork 18.6k
InverseMVNLayer #1979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InverseMVNLayer #1979
Conversation
See #1938 (comment) regarding the case of mean subtraction |
Could our Brewers give some indication if they would prefer the MVNLayer fixes to be separated out of this PR, as cdoersch suggests here that they should? It makes abundant sense that reviewing bug fixes for an existing feature is a higher priority than reviewing code for a proposed new feature. Especially since the MVNLayer fixes are a smallish subset of the the changes in this PR. I can do that, but it's a fair bit of work and don't want to expend the effort if it is no more likely to get review attention. Thanks. |
@shelhamer Are there any plans to merge this? If not, at least the fix to the MVNLayer gradient should be cherry-picked in. The gradient is currently incorrect (and it's not detected by the gradient checker because the test is also incorrect). |
MVNLayer fixes put in 2964, InverseMVNLayer doesn't seem interesting to others. |
Replaces 1895.
This PR extends the MVNLayer to allow the mean and variance blobs to be exported as top blobs. It adds a new layer type InverseMVNLayer which takes the mean and variance as bottom blobs, and performs the inverse operation (adding the mean back, and denormalizing for variance). A use case for this is an autoencoder that feeds input into the MVNLayer, and generates the output from the InverseMVNLayer, with autoencoding layers in-between.
There was also a problem with the MVNLayer that was fixed: if it is given input that has exactly zero variance (e.g. a solid color RGB image, with across_channels=false), it computes the variance as E(X^2) - (EX)^2, but the result isn't usually exactly zero, but has small negative values due to floating point resolution. The subsequent square root operation then produces NaN. This PR also fixes this issue.