KaaPexei 发表于 2021-12-4 15:32

优化算法-3|增广拉格朗日函数(ALM)及其优化方法-ADMM算法 ...

背景知识

要理解本章知识,需要有拉格朗日函数定义和对偶性的知识前提。

拉格朗日函数法(Lagrangian Multipliers)

我们先回顾一下
拉格朗日函数法:
对于一般性约束问题,

https://www.zhihu.com/equation?tex=%5Cbegin%7Balign%7D+%7B%5Crm+object%3A+%7D%5C+%5C+%5C+%5C+%26%7B%5Cmin+f%28x%29%7D+%5C%5C+%7B%5Crm+s.t.%7D+%5Cquad+%26h_i%28x%29%3D+0%2C+%5C+%26%28i%3D1%2C2%2C...%2Cm%29+%5C%5C+++%26%5Cmathcal%7BG%7D_j%28x%29+%5Cleq+0%2C%5C+%26++%28j%3D1%2C2%2C...%2Ck%29%5C%5C+%5Cend%7Balign%7D
我们可以将其改写为

https://www.zhihu.com/equation?tex=%5Cbegin%7Bequation%7D+%5Cmathcal%7BL%7D%28x%2C%5Clambda%2C%5Cmu%29%3Df%28x%29%2B+%5Csum_%7Bi%3D1%7D%5Em+%5Clambda_i+h_i%28x%29+%2B%5Csum_%7Bj%3D1%7D%5Ek+%5Cmu_j%5Cmathcal%7BG%7D_j%28x%29+%5Cend%7Bequation%7D

思考:拉格朗日算法的缺陷?
为了便于理解,推导更容易,我们先仅考虑等式约束(equality constraints)。
考虑等式约束问题:

https://www.zhihu.com/equation?tex=%5Cbegin%7Balign%7D+%7B%5Crm+object%3A+%7D%5C+%5C+%5C+%5C+%26%7B%5Cmin+f%28x%29%7D+%5C%5C+%7B%5Crm+s.t.%7D+%5Cquad+%26Ax%3Db+%5C+%5Cend%7Balign%7D
其拉格朗日函数形式为:

https://www.zhihu.com/equation?tex=%5Cmathcal%7BL%7D%28x%2C%5Clambda%29%3A%3Df%28x%29%2B%5Clambda%5ET%28Ax-b%29
Recall: 对偶性
要对进行优化,我们可以通过对偶性,利用上下界在优化点相等的特性来求解,即

https://www.zhihu.com/equation?tex=%5Cmin_x+%5Cmax_%5Clambda+f%28x%29%2B%5Clambda%5ET%28Ax-b%29 (1)
显然的,为了最大化其下界,的值将是 https://www.zhihu.com/equation?tex=%2B%5Cinfty ,除非 https://www.zhihu.com/equation?tex=Ax-b%3D0 。
因此, 函数非常不光滑(highly nonsmooth), 想要对这个拉格朗日函数估计是非常困难的。
增广拉格朗日函数法(ALM)

因为拉格朗日函数的求解非常困难,我们希望有一些办法,其优化结果与拉格朗日函数相同,但是优化过程是光滑的。
我们可以尝试增加一个罚项(penality term,也有叫"proximal point" term)。
假设已经存在一个的先验估计(prior estimation) https://www.zhihu.com/equation?tex=+%5Cbar%7B%5Clambda%7D 。
通过引入一个罚项, https://www.zhihu.com/equation?tex=%5Cfrac%7B1%7D%7B2%5Crho%7D%7C%7C%5Clambda-%5Cbar%7B%5Clambda%7D%7C%7C%5E2
将(1)改为

https://www.zhihu.com/equation?tex=%5Cmin_x+%5CBig+%5Clbrace+%5Cmax_%5Clambda+f%28x%29%2B%5Clambda%5ET%28Ax-b%29-%5Cfrac%7B1%7D%7B2%5Crho%7D%7C%7C%5Clambda-%5Cbar%7B%5Clambda%7D%7C%7C%5E2+%5CBig%5Crbrace (2)
这时,就比较容易优化了(一个二次凹函数),利用Newton-Step进行迭代:

https://www.zhihu.com/equation?tex=%5Clambda+%3D+%5Cbar%7B%5Clambda%7D%2B%5Crho%28Ax-b%29 (3)
Recall: 对偶性。 (此处理解有困难请回顾 优化算法-1|拉格朗日函数和对偶性)
将 (3)代入 (2) 得

https://www.zhihu.com/equation?tex=%5Cmin_x+f%28x%29%2B%7B%5Cbar%7B%5Clambda%7D%7D%5ET%28Ax-b%29%2B%5Cfrac%7B%5Crho%7D%7B2%7D%7C%7CAx-b%7C%7C%5E2++%3D+%5Cmin_x%5Cmathcal%7BL%7D%28x%2C%5Cbar%7B%5Clambda%7D%3B%5Crho%29 (4)

https://www.zhihu.com/equation?tex=%5Cmathcal%7BL%7D%28x%2C%5Clambda%3B%5Crho%29即为 拉格朗日增广函数。

https://www.zhihu.com/equation?tex=%5Cmathcal%7BL%7D%28x%2C%5Clambda%3B%5Crho%29+%3D+f%28x%29%2B%5Clambda%5ET%28Ax-b%29%2B%5Cfrac%7B%5Crho%7D%7B2%7D%7C%7CAx-b%7C%7C%5E2++ (5)
增广拉格朗日函数(ALM)的优化过程(optimal process):



[*]https://www.zhihu.com/equation?tex=%5Cmin_x%5Cmathcal%7BL%7D%28x%2C%5Cbar%7B%5Clambda%7D%3B%5Crho%29 https://www.zhihu.com/equation?tex=%5Crightarrow更新 https://www.zhihu.com/equation?tex=x ;
[*] 利用(2) 更新 https://www.zhihu.com/equation?tex=%5Cbar%7B%5Clambda%7D ;
[*] 重复直到收敛。
伪代码( pseudo code):
for i =1,...,k,...,T(i=T时收敛):

https://www.zhihu.com/equation?tex=x_k+%3D+%5Carg%5Cmin_x+%5Cmathcal%7BL%7D%28x%2C%5Clambda_%7Bk-1%7D%3B%5Crho%29%3B

https://www.zhihu.com/equation?tex=%5Clambda_k%3D%5Clambda_%7Bk-1%7D%2B%5Crho%28Ax_k-b%29

ALM与不等式约束

考虑不等式(inequality constraints)约束问题:

https://www.zhihu.com/equation?tex=%5Cbegin%7Balign%7D+%7B%5Crm+object%3A+%7D%5C+%5C+%5C+%5C+%26%7B%5Cmin+f%28x%29%7D+%5C%5C+%7B%5Crm+s.t.%7D+%5Cquad+%26Ax%5Cgeq+b+%5C+%5Cend%7Balign%7D (6)
前文中,不等式约束的拉格朗日乘子常使用 https://www.zhihu.com/equation?tex=%5Cmu 表示,这里为了便于理解(希望能够更容易看出不等式约束与等式约束的区别),系数也使用 表示。
Recall: KKT条件--- dual feasbility, 。
(6)的 拉格朗日函数与(1)相比,增加了的限制:

https://www.zhihu.com/equation?tex=%5Cmin_x+%5Cmax_%7B%5Clambda%5Cgeq0%7Df%28x%29%2B%5Clambda%5ET%28Ax-b%29
因此, 的更新公式相比(3),也要增加此限制:

https://www.zhihu.com/equation?tex=%5Clambda+%3D+%5Cmax+%5CBig+%28+%5Cbar%7B%5Clambda%7D%2B%5Crho%28Ax-b%29%2C0+%5CBig+%29(7)
然而,(7) 看起来也不是那么容易求,我们还是希望能够转化成(3)的形式,让求解的方法与上文提到的等式约束时保持一致。

Recall, 等式约束问题:

https://www.zhihu.com/equation?tex=%5Cmin_x+f%28x%29+%5Cquad+%7B%5Crm+s.t.%7D+%5C+%5C+c%28x%29+%3D0
不等式约束问题写作:

https://www.zhihu.com/equation?tex=%5Cmin_x+f%28x%29+%5Cquad+%7B%5Crm+s.t.%7D+%5C+%5C+c%28x%29%5Cgeq0
为了方便处理,我们可以让不等式约束以类似等式约束的形式表示:

https://www.zhihu.com/equation?tex=%5Cmin_x+f%28x%29+%5Cquad+%7B%5Crm+s.t.%7D+%5C+%5C+c%28x%29%3Ds%2C+%5C+s%5Cgeq0

因此,不等式约束问题的增广拉格朗日函数为

https://www.zhihu.com/equation?tex=%5Cmathcal%7BL%7D%28x%2Cs%2C%5Cbar%7B%5Clambda%7D%3B%5Crho%29+%3D+f%28x%29%2B%5Clambda%5ET%28c%28x%29-s%29%2B%5Cfrac%7B%5Crho%7D%7B2%7D%7C%7Cc%28x%29-s%7C%7C_2%5E2++(8)
其优化过程为
for i =1,...,k,...,T(i=T时收敛):

https://www.zhihu.com/equation?tex=%EF%BC%88x_k%EF%BC%8Cs_k%29+%3D+%5Carg%5Cmin_%7Bx%2Cs%7D+%5Cmathcal%7BL%7D%28x%2Cs%2C%5Clambda_%7Bk-1%7D%3B%5Crho%29%3B+%5C+s.t.+%5C+s%5Cgeq0

https://www.zhihu.com/equation?tex=%5Clambda_k%3D%5Clambda_k-1%2B%5Crho+%5Cbig+%28c%28x_k%29-s_k%5Cbig+%29

ADMM算法(Alternating Direction Method of Multipliers)

为什么需要ADMM算法
观察式(8)发现,其原始目标函数 (primal object function)包含(x,s)两个参数,为了处理这个情况,我们可以引入ADMM算法,对这两个参数交替优化,就可以解决双参数优化的问题。
ADMM算法说明
考虑目标问题为

https://www.zhihu.com/equation?tex=%5Cbegin%7Balign%7D+%7B%5Crm+object%3A+%7D%5C+%5C+%5C+%5C+%26%7B%5Cmin_%7B%28x%2Cz%29%7D+f%28x%29%7D%2Bh%28z%29+%5C%5C+%7B%5Crm+s.t.%7D+%5Cquad+%26Ax%2BBz%3Dc+%5C+%5Cend%7Balign%7D
其增广拉格朗日函数为

https://www.zhihu.com/equation?tex=%5Cmathcal%7BL%7D%28x%2Cz%2C%7B%5Clambda%7D%3B%5Crho%29+%3D+f%28x%29%2Bh%28z%29%2B%5Clambda%5ET%28Ax-Bz-c%29%2B%5Cfrac%7B%5Crho%7D%7B2%7D%7C%7CAx-Bz-c%7C%7C_2%5E2++
尽管https://www.zhihu.com/equation?tex=x%2Cz 是看起来分离(separable)的两个优化目标,然而我们增加的罚项"proximal point"term是一个二次(quadratic)的,让这部分减小需要同时优化(x,z),他俩就关联起来了,此处有疑问的请自行了解f(x,z)的梯度下降。
所以我们可以分别的依次的(separately and sequentially)优化 x和z:
for i =1,...,k,...,T(i=T时收敛):

https://www.zhihu.com/equation?tex=x_k+%3D+%5Carg%5Cmin_x+%5Cmathcal%7BL%7D%28x%2Cz_%7Bk-1%7D%2C%5Clambda_%7Bk-1%7D%3B%5Crho%29%3B



.

如此,我们就可以的对不等式约束问题的增广拉格朗日(8)进行优化。

ADMM算法的变体(A Variant of ADMM)

有时候,要直接找到 https://www.zhihu.com/equation?tex=%5Carg%5Cmin 也是比较困难的,我们可以尝试使用 梯度下降的办法 来逐步逼近,于是有:
for i =1,...,k,...,T(i=T时收敛):

https://www.zhihu.com/equation?tex=x_k+%3D+x_%7Bk-1%7D-%5Calpha_x%5Cfrac%7B%5Cpartial+%5Cmathcal%7BL%28x%2Cz_%7Bk-1%7D%2C%5Clambda_%7Bk-1%7D%3B%5Crho%29%7D%7D%7B%5Cpartial+x%7D

https://www.zhihu.com/equation?tex=z_k+%3D+z_%7Bk-1%7D-%5Calpha_z%5Cfrac%7B%5Cpartial+%5Cmathcal%7BL%28x_k%2Cz%2C%5Clambda_%7Bk-1%7D%3B%5Crho%29%7D%7D%7B%5Cpartial+z%7D

.
这里 https://www.zhihu.com/equation?tex=%5Calpha 是梯度下降的学习率。

AMA算法

for i =1,...,k,...,T(i=T时收敛):

https://www.zhihu.com/equation?tex=x_k+%3D+%5Carg%5Cmin_x+%5Cmathcal%7BL%7D%28x%2C0%2C%5Clambda_%7Bk-1%7D%3B%5Crho%29%3B



.
与ADMM算法的区别是,在更新 x时令z为0.
<hr/>参考资料:

[*]Standford, CS229 Lecture Notes5----Part V :http://cs229.stanford.edu/notes2021fall/cs229-notes3.pdf
[*]CMU Statistics, Convex Optimization----Karush-Kuhn-Tucker Conditions :http://stat.cmu.edu/~ryantibs/convexopt/lectures/kkt.pdf
[*] 学弱猹制作的课件, SVM: An Application of KKT Conditions: Li Liu - Personal Website
[*] University of Rochester, CSC 576: Alternating Direction Method for Multipliers (ADMM): https://www.cs.rochester.edu/u/jliu/CSC-576/class-note-11.pdf
[*] University of Wisconsin-Madison, Augmented Lagrangian Methods: http://pages.cs.wisc.edu/~swright/nd2016/IMA_augmentedLagrangian.pdf

Ilingis 发表于 2021-12-4 15:39

谢谢博主! 请问 admm中的全局一致性中,如果需要求解两个全局变量z,z1是n个x 向量,d是m个y向量,也可以展开吗,约束项写成两个相加,惩罚项两次相加,迭代过程中,依次x,d,z1,z2,乘子吗

APSchmidt 发表于 2021-12-4 15:40

没太看明白您的问题,z2是啥?如果你是想问多变量的求解,对于同一个约束函数f, 包括多种变量x1, x2, x3是没有问题的,具体的请参考多变量的梯度下降

Doris232 发表于 2021-12-4 15:46

太谢谢您了!我昨天打错了,z1是n个x向量,z2是m个d向量。迭代过程应该可以写成 第k次的x,d;z1;z2;第k次的乘子miu1,miu2。我昨天想问的是这个。我想这样是可以的吧,谢谢您!

量子计算9 发表于 2021-12-4 15:47

太棒了兄弟[笔芯]

redhat9i 发表于 2021-12-4 15:52

请问(3)带入(2)问什么负号变正号了

APSchmidt 发表于 2021-12-4 15:58

这里的lambda已经是估计的bar{lambda},跟前面的约掉了
页: [1]
查看完整版本: 优化算法-3|增广拉格朗日函数(ALM)及其优化方法-ADMM算法 ...