反向传播的数学计算过程
====================================================================================分割线
假设:
X = (x1, x2, x3)
Y=2*X = (2*x1, 2*x2, 2*x3) = (y1, y2, y3)
实际上有:
y1=f(x1,x2,x3)=2*x1
y2=f(x1,x2,x3)=2*x2
y3=f(x1,x2,x3)=2*x3
---------------------------------------------------------------------------------------------------------------------------------------------------分割线
1 计算关于X关于的雅可比矩阵
J = ( ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 1 ∂ x 3 ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ∂ y 2 ∂ x 3 ∂ y 3 ∂ x 1 ∂ y 3 ∂ x 2 ∂ y 3 ∂ x 3 ) \begin{equation*} J= \begin{pmatrix} \dfrac{\partial y_1}{\partial x_1}&\dfrac{\partial y_1}{\partial x_2}&\dfrac{\partial y_1}{\partial x_3} \\[2.5ex] \dfrac{\partial y_2}{\partial x_1}&\dfrac{\partial y_2}{\partial x_2}&\dfrac{\partial y_2}{\partial x_3} \\[2.5ex] \dfrac{\partial y_3}{\partial x_1}&\dfrac{\partial y_3}{\partial x_2}&\dfrac{\partial y_3}{\partial x_3} \\[1.5ex] \end{pmatrix} \end{equation*} J= ∂x1∂y1∂x1∂y2∂x1∂y3∂x2∂y1∂x2∂y2∂x2∂y3∂x3∂y1∂x3∂y2∂x3∂y3
2 计算各分量的偏导和 / v投影各方向上的累加和
d Y d x 1 = ∂ y 1 ∂ x 1 + ∂ y 1 ∂ x 1 + ∂ y 1 ∂ x 1 d Y d x 2 = ∂ y 1 ∂ x 2 + ∂ y 2 ∂ x 2 + ∂ y 3 ∂ x 2 d Y d x 3 = ∂ y 1 ∂ x 3 + ∂ y 2 ∂ x 3 + ∂ y 3 ∂ x 3 \dfrac{dY}{dx_1}= \dfrac{\partial y_1}{\partial x_1}+\dfrac{\partial y_1}{\partial x_1}+\dfrac{\partial y_1}{\partial x_1}\\[2.5ex] \dfrac{dY}{dx_2}= \dfrac{\partial y_1}{\partial x_2}+\dfrac{\partial y_2}{\partial x_2}+\dfrac{\partial y_3}{\partial x_2}\\[2.5ex] \dfrac{dY}{dx_3}= \dfrac{\partial y_1}{\partial x_3}+\dfrac{\partial y_2}{\partial x_3}+\dfrac{\partial y_3}{\partial x_3}\\[2.5ex] dx1dY=∂x1∂y1+∂x1∂y1+∂x1∂y1dx2dY=∂x2∂y1+∂x2∂y2+∂x2∂y3dx3dY=∂x3∂y1+∂x3∂y2+∂x3∂y3
3 确定最终分量的梯度计算表达式
d Y d X = ( d Y d x 1 d Y d x 2 d Y d x 3 ) \dfrac{dY}{dX}= \begin{pmatrix} \dfrac{dY}{dx_1}&\dfrac{dY}{dx_2}&\dfrac{dY}{dx_3} \\ \end{pmatrix} dXdY=(dx1dYdx2dYdx3dY)
4 y.backward(v) 根据函数中有无参数v进行计算
若是v=(m,n,q),则偏导数计算过程中,偏导数前应该乘上分量对应投影值
比如,若v=(1,2,3),则在表示在偏导计算过程中,对应分量x1,x2,x3应该乘上对应的投影值
以 d Y d x 1 = 1 ∗ ∂ y 1 ∂ x 1 + 2 ∗ ∂ y 1 ∂ x 1 + 3 ∗ ∂ y 1 ∂ x 1 为例 以 \ \ \dfrac{dY}{dx_1}= 1*\dfrac{\partial y_1}{\partial x_1}+2*\dfrac{\partial y_1}{\partial x_1}+3*\dfrac{\partial y_1}{\partial x_1} \ 为例 以 dx1dY=1∗∂x1∂y1+2∗∂x1∂y1+3∗∂x1∂y1 为例
更广泛的:
d Y j d x i = m ∗ ∂ y j ∂ x i + n ∗ ∂ y j ∂ x i + q ∗ ∂ y j ∂ x i \dfrac{dY_j}{dx_i}= m*\dfrac{\partial y_j}{\partial x_i}+n*\dfrac{\partial y_j}{\partial x_i}+q*\dfrac{\partial y_j}{\partial x_i} dxidYj=m∗∂xi∂yj+n∗∂xi∂yj+q∗∂xi∂yj
所以当v=(1,1,1)时,有无投影没区别
===================================
如果哪里有错误,请在评论区指出,虚心听取