-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Sigmoidlayer #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sigmoidlayer #106
Conversation
Several recent researches all recommended using ReLU. My personal micro-benchmark experiments reach the similar conclusion that ReLU is not only several times faster but also often produce better accuracy. [1] Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada. |
src/caffe/layers/sigmoid_layer.cu
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should put your copyright :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this, but would this possibly lead to errors when e.g. x is a very large negative number? An If statement dealing with x>0 cases and x<0 cases might be helpful (although it may hurt performance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least empirically we should be fine. I rand the following program:
#include <iostream>
#include <cmath>
#include <limits>
inline double sigmoid(double x) {
double val = 1. / (1. + exp(-x));
std::cout << "f(" << x << ") = " << val << std::endl;
}
int main()
{
sigmoid(std::numeric_limits<double>::max());
sigmoid(-std::numeric_limits<double>::max());
sigmoid(std::numeric_limits<double>::min());
sigmoid(0);
}
Which results in:
f(1.79769e+308) = 1
f(-1.79769e+308) = 0
f(2.22507e-308) = 0.5
f(0) = 0.5
Which is all as expected.
It'll be nice to have sigmoid layer in caffe. The current MNIST LeNet example, if you look at it closely, is actually not LeNet but LeNet with sigmoid replaced by ReLU. Having a SigmoidLayer would allow us to match standard baselines in a more strict fashion. I'll pull when the minor comments are addressed. |
I agree. ReLU is probably more useful in practice, but it's always nice to have options to compare to. |
Thanks for taking care of this! Merged. |
I added a sigmoid layer.