Skip to content

Conversation

tdomhan
Copy link
Contributor

@tdomhan tdomhan commented Feb 13, 2014

I added a sigmoid layer.

@kloudkl
Copy link
Contributor

kloudkl commented Feb 13, 2014

Several recent researches all recommended using ReLU. My personal micro-benchmark experiments reach the similar conclusion that ReLU is not only several times faster but also often produce better accuracy.

[1] Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada.
[2] George E. Dahl, Tara N. Sainath, and Geoffrey E. Hinton. Improving Deep Neural Networks for LVCSR Using Rectified Linear Units and Dropout. In ICASSP 2013.
[3] Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomas Beran, Aleksandr Y. Aravkin and Bhuvana Ramabhadran . Improvements to Deep Convolutional Neural Networks for LVCSR. In ASRU 2013.
[4] M.D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q.V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, G.E. Hinton. On Rectified Linear Units For Speech Processing. In 38th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver (2013).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should put your copyright :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this, but would this possibly lead to errors when e.g. x is a very large negative number? An If statement dealing with x>0 cases and x<0 cases might be helpful (although it may hurt performance).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least empirically we should be fine. I rand the following program:

#include <iostream>
#include <cmath>
#include <limits> 

inline double sigmoid(double x) {
  double val = 1. / (1. + exp(-x));
  std::cout << "f(" << x << ") = " << val << std::endl;   
}



int main()
{
  sigmoid(std::numeric_limits<double>::max());
  sigmoid(-std::numeric_limits<double>::max());
  sigmoid(std::numeric_limits<double>::min());
  sigmoid(0);
}

Which results in:

f(1.79769e+308) = 1
f(-1.79769e+308) = 0
f(2.22507e-308) = 0.5
f(0) = 0.5

Which is all as expected.

@Yangqing
Copy link
Member

It'll be nice to have sigmoid layer in caffe. The current MNIST LeNet example, if you look at it closely, is actually not LeNet but LeNet with sigmoid replaced by ReLU. Having a SigmoidLayer would allow us to match standard baselines in a more strict fashion. I'll pull when the minor comments are addressed.

@tdomhan
Copy link
Contributor Author

tdomhan commented Feb 13, 2014

I agree. ReLU is probably more useful in practice, but it's always nice to have options to compare to.
I added it because I'm interested in multi-label classification, for which I would have a sigmoid layer as a final layer.

Yangqing added a commit that referenced this pull request Feb 13, 2014
@Yangqing Yangqing merged commit 89a0e8e into BVLC:master Feb 13, 2014
@Yangqing
Copy link
Member

Thanks for taking care of this! Merged.

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
cypof pushed a commit that referenced this pull request Sep 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants