数值优化(四)——线搜索算法的收敛性
所谓算法的收敛性,就是通过迭代,算法一定能逼近最优解。我们前面讨论了线搜索算法的方向和步长,总结起来便是说每一次迭代都要能保证目标函数充分下降,既然每次都是下降的,而一定有下界,直觉上来说那么肯定是收敛的,但其实这样说并不严谨。举个例子,假设目标函数是 https://www.zhihu.com/equation?tex=f%28x%29%3Dx%5E2 的抛物线函数,显然最优解 https://www.zhihu.com/equation?tex=x%5E%2A%3D0%2Cf%28x%5E%2A%29%3D0 ,假设我们从初始值 https://www.zhihu.com/equation?tex=x_0%3D2 出发,不排除算法会一步步这样迭代: https://www.zhihu.com/equation?tex=x_1%3D1.5%2C+x_2%3D1.25%2Cx_3%3D1.125%2C%5Ccdots%2Cx_k%3D1%2B%5Cfrac%7B1%7D%7B2%5Ek%7D%2C%5Ccdots ,每一次迭代 https://www.zhihu.com/equation?tex=x_k 都更加靠近,目标函数值也都会下降,但显然只能收敛到 https://www.zhihu.com/equation?tex=x%3D1%2Cf%28x%29%3D1 ,所以算法是不是真的能够到达最优解,还需要严格的证明。收敛性的数学形式
我们说收敛的最优解,很简单的直接要求:
https://www.zhihu.com/equation?tex=+%5Clim_%7Bk%5Crightarrow+%5Cinfty%7Dx_k%3Dx%5E%2A+
或者说
https://www.zhihu.com/equation?tex=+%5Clim_%7Bk%5Crightarrow+%5Cinfty%7Df%28x_k%29%3Df%28x%5E%2A%29+
这样就说明算法能够到达最优解了,但有个问题是我们并不知道或者 https://www.zhihu.com/equation?tex=f%28x%5E%2A%29 是多少(否则也不用求了),这个条件并不好利用,但有一点我们是知道的,那就是 https://www.zhihu.com/equation?tex=%5Cnabla+f%28x%5E%2A%29%3D0 ,我们更常用这一点,如果算法满足:
https://www.zhihu.com/equation?tex=+%5Clim_%7Bk%5Crightarrow+%5Cinfty%7D%5C%7C%5Cnabla+f%28x_k%29%5C%7C%3D%5C%7C%5Cnabla+f%28x%5E%2A%29%5C%7C%3D0+%5Cqquad+%281%29+
我们就说算法是收敛的。
Zoutendijk条件
我们将收敛性证明分为两步,首先证明在一定条件下可以满足Zoutendijk条件,它的形式如下:
https://www.zhihu.com/equation?tex=+%5Clim_%7Bk%5Crightarrow+%5Cinfty%7D+%5Ccos%5E2%5Ctheta_k%5C%7C%5Cnabla+f_k%5C%7C%5E2%3D0+%5Cqquad+%282%29+
其中 https://www.zhihu.com/equation?tex=%5Ccos%5Ctheta_k%3D%5Cfrac%7B-%5Cnabla+f_k%5ET+p_k%7D%7B%5C%7C%5Cnabla+f_k%5C%7C+%5C%7Cp_k%5C%7C%7D ,为与的夹角余弦。
Lipschitz连续
Lipschitz连续是比连续性要求更强一点的连续,对于来说,不光要求其连续,而且要求存在 https://www.zhihu.com/equation?tex=L 满足:
https://www.zhihu.com/equation?tex=+%5C%7Cf%28x%29-f%28%5Chat%7Bx%7D%29%5C%7C%5Cleq+L%5C%7Cx-%5Chat%7Bx%7D%5C%7C%2C+%5Cqquad%5Cforall+x%2C%5Chat%7Bx%7D+
其实变换成这种形式, https://www.zhihu.com/equation?tex=%5Cfrac%7B%5C%7Cf%28x%29-f%28%5Chat%7Bx%7D%29%5C%7C%7D%7B%5C%7Cx-%5Chat%7Bx%7D%5C%7C%7D%5Cleq+L ,就是说的梯度不能有无穷大的地方,总之是有个上界的。
同样的,这是对一阶来说的,还可以要求满足Lipschitz连续:
https://www.zhihu.com/equation?tex=+%5C%7C%5Cnabla+f%28x%29-+%5Cnabla+f%28%5Chat%7Bx%7D%29%5C%7C%5Cleq+L%5C%7Cx-%5Chat%7Bx%7D%5C%7C%5Cqquad+%283%29+
前提条件
对于一个优化问题和算法来说,不是在任何情况下都满足Zoutendijk条件的,我们想要证明Zoutendijk条件,首先需要整理清楚前提条件。第一点是我们考虑的定义域,前面学习过,迭代过程中要求每一步都是下降的,所以我们只需要考虑满足 https://www.zhihu.com/equation?tex=f%28x%29%5Cleq+f%28x_0%29 的区域, https://www.zhihu.com/equation?tex=x_0 是我们的初始点,以后每次迭代都不会超过初始点,记作水平集 https://www.zhihu.com/equation?tex=%5Cmathcal%7BL%7D%3D%7Bx%3Af%28x%29%5Cleq+f%28x_0%29%7D ,但并不一定是连续的一块,我们将包含的连续开区间记为,这就是我们考虑的范围。
在开区间上,记 https://www.zhihu.com/equation?tex=x_%7Bk%2B1%7D%3Dx_k%2B%5Calpha_k+p_k ,我们有以下几点要求,
[*] 满足Lipschitz连续,即公式 https://www.zhihu.com/equation?tex=%283%29
[*] 是下降方向,即 https://www.zhihu.com/equation?tex=%5Cnabla+f_k%5ETp_k%3C0%5Cqquad%284%29
[*]https://www.zhihu.com/equation?tex=%5Calpha_k 满足wolfe条件,即:
https://www.zhihu.com/equation?tex=f_%7Bk%2B1%7D%3Cf_k%2Bc_1%5Calpha_k%5Cnabla+f_k%5ETp_k%5Cqquad+%285%29
https://www.zhihu.com/equation?tex=%5Cnabla+f_%7Bk%2B1%7D%5ETp_k%5Cgeq+c_2%5Cnabla+f_k%5ET+p_k+%5Cqquad%286%29
这几点要求对大多数优化问题来说都是比较容易满足的。
Zoutendijk条件证明
首先根据公式 https://www.zhihu.com/equation?tex=%286%29 有:
https://www.zhihu.com/equation?tex=+%28%5Cnabla+f_%7Bk%2B1%7D-%5Cnabla+f_k%29%5ETp_k%5Cgeq%28c_2-1%29%28%5Cnabla+f_k%5ET+p_k%29+%5Cqquad+%287%29+
又根据公式(3),有:
https://www.zhihu.com/equation?tex=+%5C%7C%5Cnabla+f_%7Bk%2B1%7D-%5Cnabla+f_k%5C%7C%5Cleq+%5Calpha_k+L+%5C%7Cp_k%5C%7C+
两边同时乘上 https://www.zhihu.com/equation?tex=%5C%7Cp_k%5C%7C :
https://www.zhihu.com/equation?tex=+%5C%7C%5Cnabla+f_%7Bk%2B1%7D-%5Cnabla+f_k%5C%7C%5C%7Cp_k%5C%7C%5Cleq+%5Calpha_k+L+%5C%7Cp_k%5C%7C%5E2+
根据向量内积知 https://www.zhihu.com/equation?tex=%28%5Cnabla+f_%7Bk%2B1%7D-%5Cnabla+f_k%29%5ETp_k%5Cleq%5C%7C%5Cnabla+f_%7Bk%2B1%7D-%5Cnabla+f_k%5C%7C%5C%7Cp_k%5C%7C ,则有:
https://www.zhihu.com/equation?tex=+%28%5Cnabla+f_%7Bk%2B1%7D-%5Cnabla+f_k%29%5ETp_k%5Cleq+%5Calpha_k+L+%5C%7Cp_k%5C%7C%5E2+%5Cqquad+%288%29+
联立公式 https://www.zhihu.com/equation?tex=%287%2C8%29 可知:
https://www.zhihu.com/equation?tex=+%5Calpha_k+%5Cgeq+%5Cfrac%7Bc_2-1%7D%7BL%7D%5Cfrac%7B%5Cnabla+f_k%5ET+p_k%7D%7B%5C%7Cp_k%5C%7C%5E2%7D+%5Cqquad+%289%29+
又因为公式中 https://www.zhihu.com/equation?tex=c_1+%5Cnabla+f_k%5ETp_k%3C0 ,所以将公式 https://www.zhihu.com/equation?tex=%289%29 代入,有:
https://www.zhihu.com/equation?tex=+f_%7Bk%2B1%7D%5Cleq+f_k%2Bc_1%5Cfrac%7Bc_2-1%7D%7BL%7D%5Cfrac%7B%28%5Cnabla+f_k%5ETp_k%29%5E2%7D%7B%5C%7Cp_k%5C%7C%5E2%7D+%5Cqquad+%2810%29+
再将 https://www.zhihu.com/equation?tex=%5Ccos+%5Ctheta_k 带入:
https://www.zhihu.com/equation?tex=+f_%7Bk%2B1%7D%5Cleq+f_k%2Bc_1%5Cfrac%7Bc_2-1%7D%7BL%7D+%5Ccos%5E2%5Ctheta_k%5C%7C%5Cnabla+f_k%5C%7C%5E2+
其中,记 https://www.zhihu.com/equation?tex=c%3Dc_1%5Cfrac%7B1-c_2%7D%7BL%7D%3E0 ,带入:
https://www.zhihu.com/equation?tex=+f_k+-f_%7Bk%2B1%7D%5Cgeq+c%5Ccos%5E2%5Ctheta_k%5C%7C%5Cnabla+f_k%5C%7C%5E2+%5Cqquad+%2811%29+
将公式 https://www.zhihu.com/equation?tex=%2811%29 左右两边累加则有:
https://www.zhihu.com/equation?tex=+f_0-f_%7Bk%2B1%7D%5Cgeq+c%5CSigma_%7Bj%3D0%7D%5E%7Bk%7D%5Ccos%5E2%5Ctheta_j%5C%7C%5Cnabla+f_j%5C%7C%5E2+
是有下界的,趋近无穷大时, https://www.zhihu.com/equation?tex=f_0-f_%7Bk%7D%3C%5Cinfty ,所以
https://www.zhihu.com/equation?tex=+%5CSigma_%7Bk%3D0%7D%5E%7B%5Cinfty%7D%5Ccos%5E2%5Ctheta_k%5C%7C%5Cnabla+f_k%5C%7C%5E2%3C%5Cinfty+
所以证得Zoutendijk条件:
收敛性证明
上面我们已经证明了在一定条件下,可以推导出Zoutendijk条件:
更进一步,如果能够证明
https://www.zhihu.com/equation?tex=+%5Ccos%5Ctheta_k+%5Cgeq+%5Cdelta+%3E0%2C+%5Cforall+k++%5Cqquad%2812%29+
那么显然要想仍然满足Zoutendijk条件,则只能要求 https://www.zhihu.com/equation?tex=%5Clim_%7Bk%5Crightarrow+%5Cinfty%7D%5C%7C%5Cnabla+f_k%5C%7C%3D0 ,从而证明了收敛性。
观察公式 https://www.zhihu.com/equation?tex=%2812%29 ,也就是要求 https://www.zhihu.com/equation?tex=%5Ctheta_k 与 https://www.zhihu.com/equation?tex=90%5E%5Ccirc 有一定距离,也就是下降方向与最速下降方向不会趋近于正交。其实直观上来讲这一点还是很好理解的,既然是下降方向,那么它自然应该比较贴近最速下降方向,如果她与最速下降方向接近正交,那么沿着目标函数是趋近保持不变的,这显然不是算法构造的初衷。所以只要算法构造的能够保证与正交有一定距离,那么算法就是收敛的。
红色为最速下降方向,绿色为梯度方向,黄色为正交方向
分别来看一下,对于最速下降方向, https://www.zhihu.com/equation?tex=p_k+%3D+-%5Cnabla+f_k ,天然的 https://www.zhihu.com/equation?tex=%5Ccos%5Ctheta_k%3D1 ,则自然满足收敛性。
对于牛顿方向和拟牛顿方向,我们统一记 https://www.zhihu.com/equation?tex=p_k+%3D+-B_k%5E%7B-1%7D%5Cnabla+f_k ,其中是牛顿方向中的黑塞矩阵,拟牛顿方向中构建的近似黑塞矩阵,我们要求是正定的,并且有特征值 https://www.zhihu.com/equation?tex=0%3C%5Clambda_1%5Cleq%5Clambda_2%5Cleq%5Ccdots%5Cleq%5Clambda_n ,则有:
https://www.zhihu.com/equation?tex=+%5Ccos%5Ctheta_k+%3D+%5Cfrac%7B-%5Cnabla+f_k%5ET+p_k%7D%7B%5C%7C%5Cnabla+f_k%5C%7C+%5C%7Cp_k%5C%7C%7D+%3D%5Cfrac%7B%5Cnabla+f_k%5ET+B_k%5E%7B-1%7D+%5Cnabla+f_k%7D%7B%5C%7C%5Cnabla+f_k%5C%7C+%5C%7CB_k%5E%7B-1%7D%5Cnabla+f_k%5C%7C%7D+
https://www.zhihu.com/equation?tex=+%5Cgeq+%5Cfrac%7B%5Cfrac%7B1%7D%7B%5Clambda_n%7D%5C%7C%5Cnabla+f_k%5C%7C%5E2%7D%7B%5C%7C%5Cnabla+f_k%5C%7C+%5C%7CB_k%5E%7B-1%7D%5Cnabla+f_k%5C%7C%7D+%5Cgeq+%5Cfrac%7B%5Cfrac%7B1%7D%7B%5Clambda_n%7D%5C%7C%5Cnabla+f_k%5C%7C%5E2%7D%7B%5C%7CB_k%5E%7B-1%7D%5C%7C%5C%7C%5Cnabla+f_k%5C%7C%5E2%7D+%3D%5Cfrac%7B%5Cfrac%7B1%7D%7B%5Clambda_n%7D%5C%7C%5Cnabla+f_k%5C%7C%5E2%7D%7B%5Cfrac%7B1%7D%7B%5Clambda_1%7D%5C%7C%5Cnabla+f_k%5C%7C%5E2%7D+%3D%5Cfrac%7B%5Clambda_1%7D%7B%5Clambda_n%7D+%3D%5Cfrac%7B1%7D%7B%5Ckappa%28B_k%29%7D+%5Cqquad+%2813%29+
所以,只要的条件数满足 https://www.zhihu.com/equation?tex=%5Cfrac%7B1%7D%7B%5Ckappa%28B_k%29%7D%5Cgeq+%5Cdelta%2C+%5Cforall+k ,也就是:
https://www.zhihu.com/equation?tex=+%5Ckappa%28B_k%29%3D%5C%7CB_k%5C%7C%5C%7CB_k%5E%7B-1%7D%5C%7C%5Cleq+M%2C+%5Cforall+k+
那就满足了收敛性,其中 https://www.zhihu.com/equation?tex=M 为某一常数。
上述不等式的放缩需要一点矩阵条件数的知识,放到了末尾补充。
总结
总结一下,算法的收敛性要求当趋近于无穷大时,梯度趋近于 https://www.zhihu.com/equation?tex=0 ,也就是最优解。为了证明收敛性,我们首先证明在一定条件下满足Zoutendijk条件,进一步证明了如果能够保证下降方向远离与(负)梯度方向正交,便满足收敛性。而且分别分析了最速下降方向和(拟)牛顿方向的收敛性要求,其中最速下降方向自然满足,(拟)牛顿方向要求(近似)黑塞矩阵条件数小于某一常数。
矩阵条件数
向量范数是对向量长度一种衡量,也叫向量的模,常用的是二范数,也就是欧氏空间中的距离。我们这里也都是讨论的二范数,也有像是F范数,定义为矩阵所有元素平方和再开方(很类似向量),这里不多讨论。
矩阵的范数类似向量,可以理解为是对矩阵的坐标变换能力的衡量。
矩阵二范数的定义为:
https://www.zhihu.com/equation?tex=+%5C%7CB%5C%7C%3D%5Cmax_x%5Cfrac%7B%5C%7CAx%5C%7C%7D%7B%5C%7Cx%5C%7C%7D+%5Cqquad%2814%29+
其中 https://www.zhihu.com/equation?tex=x 是一个向量,也就是一个向量经过矩阵变换后的拉伸。
因此从公式 https://www.zhihu.com/equation?tex=%2814%29 中可知 https://www.zhihu.com/equation?tex=%5C%7CB%5C%7C%5Cgeq%5Cfrac%7B%5C%7CAx%5C%7C%7D%7B%5C%7Cx%5C%7C%7D%2C%5Cforall+x 。令 https://www.zhihu.com/equation?tex=B%3DB_%7Bk%7D%5E%7B-1%7D%2Cx%3D%5Cnabla+f_k ,带入公式中的分母,我们有:
https://www.zhihu.com/equation?tex=+%5C%7CB_k%5E%7B-1%7D%5Cnabla+f_k%5C%7C%5Cleq%5C%7CB_k%5E%7B-1%7D%5C%7C%5C%7C%5Cnabla+f_k%5C%7C+
由此可知不等式可以放缩。
另外:
https://www.zhihu.com/equation?tex=+%5C%7CB%5E%7B-1%7D%5C%7C%3D%5Cfrac%7B1%7D%7B%5Cmin_x%5Cfrac%7B%5C%7CAx%5C%7C%7D%7B%5C%7Cx%5C%7C%7D%7D+
可以类似压缩。
条件数定义为:
https://www.zhihu.com/equation?tex=+%5Ckappa%28B%29%3D%5C%7CB%5C%7C%5C%7CB%5E%7B-1%7D%5C%7C+
实际上,我们可以通过矩阵的奇异值来衡量放缩能力,上面可以直接得到:
https://www.zhihu.com/equation?tex=%5C%7CB%5C%7C%3D%5Csigma_%7B%5Cmax%7D
https://www.zhihu.com/equation?tex=%5C%7CB%5E%7B-1%7D%5C%7C%3D%5Cfrac%7B1%7D%7B%5Csigma_%7B%5Cmin%7D%7D
其中 https://www.zhihu.com/equation?tex=%5Csigma_%7B%5Cmax%7D%2C%5Csigma_%7B%5Cmin%7D 分别为最大最小奇异值。特别的,正定矩阵的奇异值与特征值相等。因此公式中 https://www.zhihu.com/equation?tex=%5C%7CB_k%5E%7B-1%7D%5C%7C%3D%5Cfrac%7B1%7D%7B%5Clambda_1%7D 看完很有感触,谢谢博主[爱]
页:
[1]