Hard Sigmoid 简介#

Hard Sigmoid 是一种非线性激活函数,它通过简化传统的 Sigmoid 函数,提供了计算上的高效性。

Hard Sigmoid 激活函数与传统的 Sigmoid 函数相似,都能将输入压缩到 \([0, 1]\) 的范围内,但前者在计算上更为高效。这种函数在深度学习中尤为重要,因为它能够帮助解决梯度消失问题,同时能够提高计算速度,特别是在参数和模型大小有严格限制的环境下。

从原理上来看,Hard Sigmoid 函数实际上是 Sigmoid 函数的一种简化或近似。

标准的 Sigmoid 函数表达式为 \( \sigma(x) = \cfrac{1}{1+e^{-x}} \),而 Hardsigmoid 则通过分段线性函数来近似这一曲线,其数学表达式可以近似表示为:

\[\begin{split} \operatorname{Hardsigmoid}(x) = \begin{cases} 0 & \text{if } x < -2.5 \\ 0.2x + 0.5 & \text{if } -2.5 \leq x \leq 2.5 \\ 1 & \text{if } x > 2.5 \end{cases} \end{split}\]
  • 从输出特性上看,当输入值小于 \(-2.5\) 时,Hardsigmoid 输出为 \(0\);当输入值大于 \(2.5\) 时,输出为 \(1\);而在 \(-2.5\)\(2.5\) 之间,输出是输入的线性函数 \(0.2x + 0.5\)。这意味着 Hard Sigmoid 在中间带有一定的线性过渡区,而在两端则保持常数输出。

  • 从计算效率上看,Hard Sigmoid 优势明显。由于避免了指数运算,它的计算速度快于传统的 Sigmoid 函数,特别适合于需要高速计算的场合,如移动设备或嵌入式系统中的深度学习应用。

  • 从应用场景来看,Hard Sigmoid 通常用于对模型大小和计算速度有严格要求的深度学习模型中。在这类场景下,激活函数的选择对模型的性能和大小有直接影响。例如,在深度学习模型的深层网络中,使用 Hard Sigmoid 可以有效减少计算负担,同时保持模型性能。

  • 从优缺点分析来看,尽管 Hard Sigmoid 在某些方面优于传统的 Sigmoid 函数,但它也有自己的局限性。例如,它的输出不是非常平滑,这可能在某些情况下导致梯度消失问题。因此,在选择 Hard Sigmoid 作为激活函数时,需要根据具体的应用和网络结构来评估其适用性。

综上所述,Hard Sigmoid 是传统 Sigmoid 函数的一种高效近似,通过简化计算过程提高了模型的运行速度,特别适合在对计算资源和速度要求较高的场景。然而,使用 HardSigmoid 时也需要注意其可能带来的梯度问题,并根据具体问题选择合适的激活函数。

NumPy/ONNX 实现 Hardsigmoid#

参考:onnx__HardSigmoid

使用 Numpy 实现如下:

import numpy as np

def hard_sigmoid(x):
    return np.clip(0.2*x + 0.5, 0, 1)

示例:

x = np.array([-7, -2.5, -2.4, 0, 2.4, 2.5, 22])
hard_sigmoid(x)
array([0.  , 0.  , 0.02, 0.5 , 0.98, 1.  , 1.  ])

将其图像可视化:

import plotly.graph_objects as go
x = np.linspace(-7, 7, 100)
y = hard_sigmoid(x)
fig = go.Figure(data=go.Scatter(x=x, y=y, mode='lines+markers'))
fig.update_layout(
    xaxis_title='x',
    yaxis_title='hard_sigmoid(x)'
)

这里 \(0.2\)\(\cfrac{1}{6}\) 的近似:

1/6
0.16666666666666666

真正的编程实现如下:

NumPy/TensorFlow/PyTorch 实现 Hardsigmoid#

参考:tf.keras hard_sigmoid & torch.nn.Hardsigmoid

\[\begin{split} \operatorname{Hardsigmoid}(x) = \begin{cases} 0 & \text{if } x \le -3 \\ x/6 + 1/2 & \text{if } -3 \lt x \lt 3 \\ 1 & \text{if } x \ge 3 \end{cases} \end{split}\]
import numpy as np

def hard_sigmoid(x):
    return np.clip(x/6 + 1/2, 0, 1)
import plotly.graph_objects as go
x = np.linspace(-7, 7, 100)
y = hard_sigmoid(x)
fig = go.Figure(data=go.Scatter(x=x, y=y, mode='lines+markers'))
fig.update_layout(
    xaxis_title='x',
    yaxis_title='hard_sigmoid(x)'
)
np.clip?
Signature:       np.clip(a, a_min, a_max, out=None, **kwargs)
Call signature:  np.clip(*args, **kwargs)
Type:            _ArrayFunctionDispatcher
String form:     <function clip at 0x7f97c41f39c0>
File:            /media/pc/data/lxw/envs/anaconda3x/envs/py312/lib/python3.12/site-packages/numpy/core/fromnumeric.py
Docstring:      
Clip (limit) the values in an array.

Given an interval, values outside the interval are clipped to
the interval edges.  For example, if an interval of ``[0, 1]``
is specified, values smaller than 0 become 0, and values larger
than 1 become 1.

Equivalent to but faster than ``np.minimum(a_max, np.maximum(a, a_min))``.

No check is performed to ensure ``a_min < a_max``.

Parameters
----------
a : array_like
    Array containing elements to clip.
a_min, a_max : array_like or None
    Minimum and maximum value. If ``None``, clipping is not performed on
    the corresponding edge. Only one of `a_min` and `a_max` may be
    ``None``. Both are broadcast against `a`.
out : ndarray, optional
    The results will be placed in this array. It may be the input
    array for in-place clipping.  `out` must be of the right shape
    to hold the output.  Its type is preserved.
**kwargs
    For other keyword-only arguments, see the
    :ref:`ufunc docs <ufuncs.kwargs>`.

    .. versionadded:: 1.17.0

Returns
-------
clipped_array : ndarray
    An array with the elements of `a`, but where values
    < `a_min` are replaced with `a_min`, and those > `a_max`
    with `a_max`.

See Also
--------
:ref:`ufuncs-output-type`

Notes
-----
When `a_min` is greater than `a_max`, `clip` returns an
array in which all values are equal to `a_max`,
as shown in the second example.

Examples
--------
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.clip(a, 1, 8)
array([1, 1, 2, 3, 4, 5, 6, 7, 8, 8])
>>> np.clip(a, 8, 1)
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
>>> np.clip(a, 3, 6, out=a)
array([3, 3, 3, 3, 4, 5, 6, 6, 6, 6])
>>> a
array([3, 3, 3, 3, 4, 5, 6, 6, 6, 6])
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.clip(a, [3, 4, 1, 1, 1, 4, 4, 4, 4, 4], 8)
array([3, 4, 2, 3, 4, 5, 6, 7, 8, 8])
Class docstring:
Class to wrap functions with checks for __array_function__ overrides.

All arguments are required, and can only be passed by position.

Parameters
----------
dispatcher : function or None
    The dispatcher function that returns a single sequence-like object
    of all arguments relevant.  It must have the same signature (except
    the default values) as the actual implementation.
    If ``None``, this is a ``like=`` dispatcher and the
    ``_ArrayFunctionDispatcher`` must be called with ``like`` as the
    first (additional and positional) argument.
implementation : function
    Function that implements the operation on NumPy arrays without
    overrides.  Arguments passed calling the ``_ArrayFunctionDispatcher``
    will be forwarded to this (and the ``dispatcher``) as if using
    ``*args, **kwargs``.

Attributes
----------
_implementation : function
    The original implementation passed in.

还有其他写法:

\[ \operatorname{Hardsigmoid}(x) = \cfrac {\operatorname{HardTanh}(x + 3, 0., 6.)}{6} = \cfrac {\operatorname{ReLU6}(x+3)} {6} \]

或者

\[ \operatorname{Hardsigmoid}(x) = {\operatorname{HardTanh}(x * 0.2 + 0.5, 0., 1.)} = {\operatorname{HardTanh}(x * 1/6 + 0.5, 0., 1.)} \]