微分

有矩阵 \(\mathbf{A} = [\mathbf{a}_1, \mathbf{a}_2, \cdots, \mathbf{a}_m]{\top} \in \mathbb{R}^{m \times n}\) 和向量 \(\mathbf{x} = [x_1, x_2, \cdots, x_n]{\top} \in \mathbb{R}^{n}\),则有:

\[\begin{split} \mathbf{Ax} = \begin{bmatrix} \langle \mathbf{a}_1, \mathbf{x} \rangle\\ \langle \mathbf{a}_2, \mathbf{x} \rangle\\ \vdots \\ \langle \mathbf{a}_m, \mathbf{x} \rangle \end{bmatrix} \end{split}\]

向量值函数的导数

\(\mathbf{f}: \mathbb{R}^n \rightarrow \mathbb{R}^m\),则梯度记作:

\[ \nabla_{\mathbf{x}} \mathbf{f(x)} = \bigg[\frac{\partial \mathbf{f(x)}}{\partial x_1}, \frac{\partial \mathbf{f(x)}}{\partial x_2}, \ldots, \frac{\partial \mathbf{f(x)}}{\partial x_n}\bigg] \]

若有 \(\mathbf{y} = [y_1, y_2, \ldots, y_m]^\top\)\(x \in \mathbb{R}\),则:

\[ \frac{\partial \mathbf{y}}{\partial x} = \bigg[\frac{\partial y_1}{\partial x}, \frac{\partial y_2}{\partial x}, \ldots, \frac{\partial y_m}{\partial x}\bigg]^\top \]

还有,

\[\begin{split} \frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \bigg[\frac{\partial y_1}{\partial \mathbf{x}}, \frac{\partial y_2}{\partial \mathbf{x}}, \ldots, \frac{\partial y_m}{\partial \mathbf{x}}\bigg]^\top = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} &\cdots &\frac{\partial y_1}{\partial x_n} \\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n} \end{bmatrix} \end{split}\]

可以记作:\(\mathbf{f^{'}(x)}\) 或者 \(\mathbf{D} \mathbf{f(x)}\)。容易推出:

\[\begin{split} \nabla_\mathbf{x} \mathbf{f(x)} = \mathbf{f^{'}(x)}^{\top}\\ \end{split}\]

例1 计算:\(\nabla_{\mathbf{x}} ||\mathbf{x}||^2\)

因为,

\[\begin{split} \begin{align} \mathbf{d} ||\mathbf{x}||^2 &= \mathbf{d} \langle \mathbf{x}, \mathbf{x} \rangle\\ &= \mathbf{d} \sum_{i=1}^n x_i^2\\ &= \sum_{i=1}^n \mathbf{d} x_i^2\\ &= 2 \sum_{i=1}^n x_i \mathbf{d} x_i\\ &= 2 \mathbf{x}^{\top} \mathbf{d} \mathbf{x}\\ \end{align} \end{split}\]

所以,\(\nabla_{\mathbf{x}} ||\mathbf{x}||^2 = 2 \mathbf{x}\)

拓展

可以借助内积运算简化求解过程,比如:

\[\begin{split} \begin{align} \mathbf{d} ||\mathbf{x}||^2 &= \langle \mathbf{d} \mathbf{x}, \mathbf{x} \rangle + \langle \mathbf{x}, \mathbf{d} \mathbf{x} \rangle\\ &= \langle 2 \mathbf{x}, \mathbf{d} \mathbf{x} \rangle\\ &= \langle \nabla_{\mathbf{x}} ||\mathbf{x}||^2, \mathbf{dx} \rangle \end{align} \end{split}\]