Ilingis 发表于 2021-12-11 21:37

牛顿法用于优化问题

无约束优化算法可以分为线搜索类算法与信赖域类算法两类,他们都是对https://www.zhihu.com/equation?tex=f%28%5Cbold+x%29在局部进行近似,前者用得更加普遍。而线搜索类算法根据搜索方向选择的不同可以分为梯度算法(例如深度学习中常用的梯度下降)、牛顿算法、拟牛顿算法、次梯度算法等
本文目的是介绍牛顿法。平常我们说牛顿法,一般指的是用牛顿法求方程根,因而先复习牛顿法求根的原理,然后扩展到用牛顿法求极值,再进一步扩展到多元函数牛顿法求极值
1. 一元函数牛顿法求根

复杂方程的根很难直接求得,最开始用牛顿法迭代来求方程的根。方法是给 一个初值,在处用一阶泰勒展式来近似表示函数

https://www.zhihu.com/equation?tex=f%28x%29%3Df%5Cleft%28x_%7B1%7D%5Cright%29%2Bf%5E%7B%5Cprime%7D%5Cleft%28x_%7B1%7D%5Cright%29%5Cleft%28x-x_%7B1%7D%5Cright%29+%2B+%5Cmathcal%7BO%7D%28x-x_1%29
将 https://www.zhihu.com/equation?tex=f%28x%29%3D0 带入上式, 求得与轴交点横坐标 https://www.zhihu.com/equation?tex=x%3Dx_%7B1%7D-f%5Cleft%28x_%7B1%7D%5Cright%29+%2F+f%5E%7B%5Cprime%7D%5Cleft%28x_%7B1%7D%5Cright%29, 这个点并不是函数的根, 但是距离真正的根更近了一点,不断迭代优化,有如下步骤


最终求得的值变化小于一个阈值就认为这个值是函数的近似根, 称为牛顿法收敛
2. 一元函数牛顿法求极值

牛顿法用于求函数极值。对于的极值点也就是求 https://www.zhihu.com/equation?tex=f%5E%7B%5Cprime%7D%28x%29 的根, 那么也就是如上面介绍的 求的解。给定初值 https://www.zhihu.com/equation?tex=x_%7B1%7D+%EF%BC%8C 在处用二阶泰勒展式展开

https://www.zhihu.com/equation?tex=f%28x%29%3Df%5Cleft%28x_%7B1%7D%5Cright%29%2Bf%5E%7B%5Cprime%7D%5Cleft%28x_%7B1%7D%5Cright%29%5Cleft%28x-x_%7B1%7D%5Cright%29%2B%5Cfrac%7B1%7D%7B2%7D+f%5E%7B%5Cprime+%5Cprime%7D%5Cleft%28x_%7B1%7D%5Cright%29%5Cleft%28x-x_%7B1%7D%5Cright%29%5E%7B2%7D
对求导, 令 , 得 https://www.zhihu.com/equation?tex=x%3Dx_%7B1%7D-f%5E%7B%5Cprime%7D%5Cleft%28x_%7B1%7D%5Cright%29+%2F+f%5E%7B%5Cprime+%5Cprime%7D%5Cleft%28x_%7B1%7D%5Cright%29, 依次迭代得到递推公式


当xn的值变化小于阈值时认为算法收敛
3. 多元函数牛顿法求极值

对定义在https://www.zhihu.com/equation?tex=%5Cmathbb%7BR%7D%5E%7Bn%7D上的二次可微函数https://www.zhihu.com/equation?tex=f%28%5Cboldsymbol+x%29,考虑其在https://www.zhihu.com/equation?tex=x_k处二阶泰勒展开,在其定义域上取https://www.zhihu.com/equation?tex=x_k%2Cd_k+%5Cin+%5Cmathbb%7BR%7D%5E%7Bn%7D,其中趋近于0向量,展开结果如下

https://www.zhihu.com/equation?tex=f%28x_k%2Bd_k%29%3Df%28x_k%29%2B%5Cnabla+f%28x_k%29%5E%5Cmathrm%7BT%7Dd_k+%2B+%5Cfrac%7B1%7D%7B2%7D%28d_k%29%5E%5Cmathrm%7BT%7D%5Cnabla%5E2f%28x_k%29d_k%2B%5Cmathcal%7BO%7D%28%7C%7Cd_k%7C%7C%5E2%29
忽略高阶无穷小,将上式看作的函数,对其求导(求梯度?)后令https://www.zhihu.com/equation?tex=f%27%28x_k%2Bd_k%29%3D0,化简可得牛顿方程

https://www.zhihu.com/equation?tex=%5Cnabla%5E%7B2%7D+f%5Cleft%28x_%7Bk%7D%5Cright%29+d_%7Bk%7D%3D-%5Cnabla+f%5Cleft%28x_%7Bk%7D%5Cright%29
由上式,可由简单的矩阵运算规则得到优化方向https://www.zhihu.com/equation?tex=d_k%3D-%5Cnabla%5E%7B2%7D+f%5Cleft%28x_%7Bk%7D%5Cright%29%5E%7B-1%7D+%5Cnabla+f%5Cleft%28x_%7Bk%7D%5Cright%29
因而可以得到迭代规则

https://www.zhihu.com/equation?tex=x_%7Bx%2B1%7D%3Dx_%7Bk%7D-%5Cnabla%5E%7B2%7D+f%5Cleft%28x_%7Bk%7D%5Cright%29%5E%7B-1%7D+%5Cnabla+f%5Cleft%28x_%7Bk%7D%5Cright%29
当xn的值变化小于阈值时认为算法收敛
4. 对比

4.1 牛顿法对比

如果把矩阵求逆https://www.zhihu.com/equation?tex=%5Cnabla%5E%7B2%7D+f%5Cleft%28x_%7Bk%7D%5Cright%29%5E%7B-1%7D,在形式上写作分母。将上面三者罗列如下,可见形式完全一样(吐槽:这不废话)





https://www.zhihu.com/equation?tex=x_%7Bx%2B1%7D%3Dx_%7Bk%7D-%5Cnabla+f%5Cleft%28x_%7Bk%7D%5Cright%29%2F%5Cnabla%5E%7B2%7D+f%5Cleft%28x_%7Bk%7D%5Cright%29
4.2 牛顿法与梯度下降对比

梯度下降

https://www.zhihu.com/equation?tex=x_%7Bk%2B1%7D%3Dx_%7Bk%7D-%5Calpha_%7Bk%7D%5Cnabla+f%28x_k%29
步长不等于1的牛顿法(也就是优化方向乘以alpha)

https://www.zhihu.com/equation?tex=x_%7Bx%2B1%7D%3Dx_%7Bk%7D-%5Calpha_k+%5Cnabla%5E%7B2%7D+f%5Cleft%28x_%7Bk%7D%5Cright%29%5E%7B-1%7D+%5Cnabla+f%5Cleft%28x_%7Bk%7D%5Cright%29

5. 参考


[*]《最优化计算方法》 文再文
[*] [牛顿法求极值](answer:牛顿法求极值)
页: [1]
查看完整版本: 牛顿法用于优化问题