Skip to content

Conversation

jeffdonahue
Copy link
Contributor

The eltwise product layer's forward computation is (given inputs x, y, z) p := x .* y .* z. Previously I was computing the gradient w.r.t. x as p ./ x (analogously for y, z); this changes the layer to by default compute the gradient w.r.t. x as y .* z, which is asymptotically slower in the number of inputs (O(n^2) instead of O(n)), but stabler than dividing by the potentially near-zero x. For the case of two inputs (which is probably 99% of the uses of this layer, including the CIFAR example) it's actually faster (just copy the other input) and more accurate, but if you have lots of inputs and you're sure dividing by them will not cause any instability (e.g. if you specifically took measures to condition the inputs as such) you can still set the stable_prod_grad: false option for the old method.

This division by near-zero was causing NaNs in the cifar_full example, which uses the eltwise product as part of the WITHIN_CHANNEL LRN computation.

jeffdonahue added a commit that referenced this pull request Aug 26, 2014
Make eltwise product gradient stabler (fixes issues with WITHIN_CHANNEL LRN in cifar_full example)
@jeffdonahue jeffdonahue merged commit 5a0ad46 into BVLC:dev Aug 26, 2014
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
Make eltwise product gradient stabler (fixes issues with WITHIN_CHANNEL LRN in cifar_full example)
RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014
Make eltwise product gradient stabler (fixes issues with WITHIN_CHANNEL LRN in cifar_full example)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant