Activation function

ArtificialNeuronModel_english.png

인공 신경망 모델에서 뉴런의 주요 기능은 입력과 연결 강도의 가중합 NET 를 구한 다음 활성화 함수에 의해 출력을 내보내는 것이다. 따라서, 어떤 활성화 함수를 선택하느냐에 따라 뉴런의 출력이 달라질 수도 있다.

활성화 함수 (activation function) 는 단조 증가하는 함수이어야 하며, 일반적으로 다음과 같이 분류할 수 있다. 이들 분류 방법은 서로 연관되어 있기 때문에 엄밀히 구분하여 설명하기 보다는 이해를 돕기 위하여 일반적으로 사용되는 함수를 위주로 기술한다.

Type

단극성 (unipolar) / 양극성 (bipolar) 함수
선형 (linear) / 비선형 (nonlinear) 함수
연속 (continuous) / 이진 (binary) 함수

Sigmoid function

시그모이드 함수(sigmoid function) 는 아래 그림과 같이 단극성 또는 양극성 비선형 연속 함수이며, 신경망 모델의 활성화 함수로써 가장 널리 사용되고 있다. 시그모이드 함수는 형태가 S 자 모양이므로 S 형 곡선이라고도 한다. 자세한 내용은 해당 항목을 참조.

Logistic-curve.svg.png

Effective Activation function

ReLU / Rectified-Linear and Leaky-ReLU
Sigmoid function
TanH / Hyperbolic Tangent
Absolute Value
Power
BNLL

Comparison of activation functions

Some desirable properties in an activation function include:

Nonlinear: When the activation function is non-linear, then a two-layer neural network can be proven to be a universal function approximator. The identity activation function does not satisfy this property. When multiple layers use the identity activation function, the entire network is equivalent to a single-layer model.
Continuously differentiable: This property is necessary for enabling gradient-based optimization methods. The binary step activation function is not differentiable at 0, and it differentiates to 0 for all other values, so gradient-based methods can make no progress with it.
Monotonic: When the activation function is monotonic, the error surface associated with a single-layer model is guaranteed to be convex.
\(f(x)\approx x\) when \(x \approx 0\): This property enables the neural network to train efficiently when its weights are initialized with small random values. When the activation function does not satisfy this property, special care must be used when initializing the weights.
Range: When the range of the activation function is finite, gradient-based training methods tend to be more stable, because pattern presentations significantly affect only limited weights. When the range is infinite, training is generally more efficient because pattern presentations significantly affect most of the weights. In the latter case, smaller learning rates are typically necessary.

The following table compares the properties of several activation functions:

Name	Plot	방정식	미분	Monotonic	\(f(x)\approx x\) when \(x \approx 0\)	Range
Identity	60px	\(f(x)=x\)	\(f'(x)=1\)	Yes	Yes	\((-\infty,\infty)\)
Binary step	60px	\(f(x) = \left \{ \begin{array}{rcl} 0 & \mbox{for} & x < 0\\ 1 & \mbox{for} & x \ge 0\end{array} \right.\)	\(f'(x) = \left \{ \begin{array}{rcl} 0 & \mbox{for} & x \ne 0\\ ? & \mbox{for} & x = 0\end{array} \right.\)	Yes	No	\(\{0,1\}\)
Logistic (a.k.a Soft step)	60px	\(f(x)=\frac{1}{1+e^{-x}}\)	\(f'(x)=f(x)(1-f(x))\)	Yes	No	\((0,1)\)
TanH	60px	\(f(x)=\tanh(x)=\frac{2}{1+e^{-2x}}-1\)	\(f'(x)=1-f(x)^2\)	Yes	Yes	\((-1,1)\)
ArcTan	60px	\(f(x)=\tan^{-1}(x)\)	\(f'(x)=\frac{1}{x^2+1}\)	Yes	Yes	\((-\frac{\pi}{2},\frac{\pi}{2})\)
Rectified Linear	60px	\(f(x) = \left \{ \begin{array}{rcl} 0 & \mbox{for} & x < 0\\ x & \mbox{for} & x \ge 0\end{array} \right.\)	\(f'(x) = \left \{ \begin{array}{rcl} 0 & \mbox{for} & x < 0\\ 1 & \mbox{for} & x \ge 0\end{array} \right.\)	Yes	No	\([0,\infty)\)
SoftPlus	60px	\(f(x)=\log_e(1+e^x)\)	\(f'(x)=\frac{1}{1+e^{-x}}\)	Yes	No	\((0,\infty)\)
Bent identity	60px	\(f(x)=\frac{\sqrt{x^2 + 1} - 1}{2} + x\)	\(f'(x)=\frac{x}{2\sqrt{x^2 + 1}} + 1\)	Yes	Yes	\((-\infty,\infty)\)
SoftExponential	60px	\(f(\alpha,x) = \left \{ \begin{array}{rcl} -\frac{\log_e(1-\alpha (x + \alpha))}{\alpha} & \mbox{for} & \alpha < 0\\ x & \mbox{for} & \alpha = 0\\ \frac{e^{\alpha x} - 1}{\alpha} + \alpha & \mbox{for} & \alpha > 0\end{array}\right.\)	\(f'(\alpha,x) = \left \{ \begin{array}{rcl} \frac{1}{1-\alpha (\alpha + x)} & \mbox{for} & \alpha < 0\\ e^{\alpha x} & \mbox{for} & \alpha \ge 0\end{array} \right.\)	Yes	Yes iff \(\alpha\approx0\)	\((-\infty,\infty)\)
Sinusoid	60px	\(f(x)=\sin(x)\)	\(f'(x)=\cos(x)\)	No	Yes	\([-1,1]\)
Sinc	60px	\(f(x)=\left \{ \begin{array}{rcl} 1 & \mbox{for} & x = 0\\ \frac{\sin(x)}{x} & \mbox{for} & x \ne 0\end{array} \right.\)	\(f'(x)=\left \{ \begin{array}{rcl} 0 & \mbox{for} & x = 0\\ \frac{\cos(x)}{x} - \frac{\sin(x)}{x^2} & \mbox{for} & x \ne 0\end{array} \right.\)	No	No	\([\approx-.217234,1]\)
Gaussian	60px	\(f(x)=e^{-x^2}\)	\(f'(x)=-2xe^{-x^2}\)	No	No	\((0,1]\)

Activation function

Type

Sigmoid function

Effective Activation function

Comparison of activation functions

See also

Favorite site