【ADMM笔记】(二)ADMM算法细节、变体和常见技巧
论文题目:《Distributed optimization and statistical learning via the alternating direction method of multipliers》论文地址:
这篇笔记对应论文的章节:
3 Alternating Direction Method of Multipliers3.1 Algorithm3.3 Optimality Conditions and Stopping Criterion3.4 Extensions and Variations4 General Patterns4.3 Smooth Objective Terms
Algorithm
考虑问题:
https://www.zhihu.com/equation?tex=%5Cmin+f%28x%29+%2B+g%28z%29+%5Ctag%7B1a%7D+%5C%5C
https://www.zhihu.com/equation?tex=subject+%5Cquad+to.+%5Cquad+Ax+%2B+Bz+%3D+c+%5Ctag%7B1b%7D+%5C%5C
ADMM两种形式
unscaled form ADMM
上一个文章已经给了一种形式的ADMM算法,我们称之为unscaled form:
https://www.zhihu.com/equation?tex=x%5E%7Bk%2B1%7D+%3A%3D+%5Carg%5Cmin_x+L_%7B%5Crho%7D+%28x%2Cz%5Ek%2Cy%5Ek%29+%5Ctag%7B2a%7D+%5C%5C
https://www.zhihu.com/equation?tex=y%5E%7Bk%2B1%7D+%3A%3D+y%5Ek+%2B+%5Crho%28Ax%5E%7Bk%2B1%7D+%2B+Bz%5E%7Bk%2B1%7D+-c+%29+%5Ctag%7B2c%7D+%5C%5C
scaled form ADMM
https://www.zhihu.com/equation?tex=L_%7B%5Crho%7D%28x%2Cz%2Cy%29+%3D+f%28x%29+%2B+y%5ET+r+%2B++%5Cfrac%7B%5Crho%7D%7B2%7D+%7C%7Cr%7C%7C%5E2_2%2C%5Cquad+r%3DAx%2BBz-c+%5C%5C
化简得:
https://www.zhihu.com/equation?tex=L_%7B%5Crho%7D%28x%2Cz%2Cy%29+%3D+f%28x%29+%2B%5Cfrac%7B%5Crho%7D%7B2%7D+%7C%7Cr+%2B+%5Cfrac%7B1%7D%7B%5Crho%7D+y%7C%7C%5E2_2+-+%5Cfrac%7B%5Crho%7D%7B2%7D%7C%7C%5Cfrac%7B1%7D%7B%5Crho%7D+y%7C%7C%5E2_2+%5C%5C
令 https://www.zhihu.com/equation?tex=u+%3D+%5Cfrac%7B1%7D%7B%5Crho%7D+y :
https://www.zhihu.com/equation?tex=L_%7B%5Crho%7D%28x%2Cz%2Cu%29+%3D+f%28x%29+%2B%5Cfrac%7B%5Crho%7D%7B2%7D+%7C%7Cr+%2B+u%7C%7C%5E2_2+-+%5Cfrac%7B%5Crho%7D%7B2%7D%7C%7Cu%7C%7C%5E2_2++%5C%5C
由此ADMM方法变为:
https://www.zhihu.com/equation?tex=x%5E%7Bk%2B1%7D+%3A%3D+%5Carg%5Cmin_x+%28f%28x%29+%2B%5Cfrac%7B%5Crho%7D%7B2%7D+%7C%7CAx%2BBz%5Ek-c+%2B+u%5Ek%7C%7C%5E2_2%29+%5Ctag%7B3a%7D+%5C%5C
https://www.zhihu.com/equation?tex=z%5E%7Bk%2B1%7D+%3A%3D+%5Carg%5Cmin_z+%28g%28z%29+%2B%5Cfrac%7B%5Crho%7D%7B2%7D+%7C%7CAx%5E%7Bk%2B1%7D+%2B+Bz+-+c+%2B+u%5Ek%7C%7C%5E2_2%29+%5Ctag%7B3b%7D+%5C%5C
https://www.zhihu.com/equation?tex=u%5E%7Bk%2B1%7D+%3A%3D+u%5Ek+%2B+Ax%5E%7Bk%2B1%7D+%2B+Bz%5E%7Bk%2B1%7D+-c++%5Ctag%7B3c%7D+%5C%5C
(3)被称为scaled form ADMM。
最优性条件
记原问题最优解为,对偶问题最优解为。考虑(1)的KKT条件:
https://www.zhihu.com/equation?tex=L%28x%2Cy%29+%3D+f%28x%29+%2B+g%28z%29+%2B+y%5ET%28Ax+%2BBz+-c%29+%5C%5C
https://www.zhihu.com/equation?tex=%5Cnabla_x+L+%3D+%5Cnabla+f%28x%5E%2A%29+%2B+A%5ET+y%5E%2A%3D+0+%5Ctag%7B4a%7D%5C%5C
https://www.zhihu.com/equation?tex=%5Cnabla_z+L+%3D+%5Cnabla+g%28z%5E%2A%29+%2B+B%5ET+y%5E%2A%3D+0+%5Ctag%7B4b%7D%5C%5C
同时需要满足约束:
https://www.zhihu.com/equation?tex=Ax%5E%2A+%2B+By%5E%2A+%3D+c+%5Ctag%7B4c%7D+%5C%5C
根据对偶理论知:
(4a)、(4b)、(4c)是、分别为原问题和对偶问题最优解的充分必要条件。
Remark 1. (4a)、(4b)被称为dual feasibility,(4c)被称为primal feasibility。
终止准则
(4a)、(4b)、(4c)给了最优性的条件,根据这一条件,我们利用迭代点与最优性之间的差距作为终止条件。
比如对(4c):
https://www.zhihu.com/equation?tex=r%5E%7Bk%2B1%7D+%3D+Ax%5E%7Bk%2B1%7D+%2B+Bz%5E%7Bk%2B1%7D+-c+%5Ctag%7B5a%7D+%5C%5C
https://www.zhihu.com/equation?tex=r%5E%7Bk%2B1%7D 被称为primal residual,终止时显然需要有 https://www.zhihu.com/equation?tex=r%5E%7Bk%2B1%7D+%5Cle+%5Cepsilon%5E%7Bpri%7D+%5Ctag%7B6a%7D
现在需要判断迭代结果尽可能接近(4a)、(4b)条件。
根据(2b):
因此, https://www.zhihu.com/equation?tex=L_%7B%5Crho%7D+%28x%5E%7Bk%2B1%7D%2Cz%2Cy%5Ek%29 在 https://www.zhihu.com/equation?tex=z+%3D+z%5E%7Bk%2B1%7D 处导数为0:
https://www.zhihu.com/equation?tex=+0+%3D+%5Cnabla+g%28z%5E%7Bk%2B1%7D%29+%2B+B%5ETy%5Ek+%2B+%5Crho+B%5ET%28Ax%5E%7Bk%2B1%7D+%2B+Bz%5E%7Bk%2B1%7D+-+c%29+%5C%5C
https://www.zhihu.com/equation?tex=%3D+%5Cnabla+g%28z%5E%7Bk%2B1%7D%29+%2B+B%5ETy%5Ek+%2B+%5Crho+B%5ETr%5E%7Bk%2B1%7D+%3D+%5Cnabla+g%28z%5E%7Bk%2B1%7D%29+%2B+B%5ETy%5E%7Bk%2B1%7D+%5C%5C
这说明在迭代的过程中(4b)始终成立。
同样,根据(2a), https://www.zhihu.com/equation?tex=L_%7B%5Crho%7D+%28x%2Cz%5Ek%2Cy%5Ek%29 在 https://www.zhihu.com/equation?tex=x+%3D+x%5E%7Bk%2B1%7D 处导数为0:
https://www.zhihu.com/equation?tex=+0+%3D+%5Cnabla+f%28x%5E%7Bk%2B1%7D%29+%2B+A%5ETy%5Ek+%2B+%5Crho+A%5ET%28Ax%5E%7Bk%2B1%7D+%2B+Bz%5Ek+-+c%29+%5C%5C
https://www.zhihu.com/equation?tex=%3D+%5Cnabla+f%28x%5E%7Bk%2B1%7D%29+%2B+A%5ETy%5Ek+%2B+%5Crho+A%5ET%28Ax%5E%7Bk%2B1%7D+%2B+Bz%5E%7Bk%2B1%7D+-+c%29+%2B+%5Crho+A%5ETB%28z%5Ek+-+z%5E%7Bk%2B1%7D%29%3D+%5Cnabla+f%28x%5E%7Bk%2B1%7D%29+%2B+A%5ETy%5E%7Bk%2B1%7D+%2B+%5Crho+A%5ETB%28z%5Ek+-+z%5E%7Bk%2B1%7D%29+%5C%5C
因此:
https://www.zhihu.com/equation?tex=+%5Cnabla+f%28x%5E%7Bk%2B1%7D%29+%2B+A%5ETy%5E%7Bk%2B1%7D+%3D++%5Crho+A%5ETB%28z%5E%7Bk%2B1%7D+-+z%5Ek%29++%5Ctag%7B7%7D%5C%5C
根据(7)可知,(4a)并不满足。
我们定义dual residual: https://www.zhihu.com/equation?tex=s%5E%7Bk%2B1%7D+%3D+%5Crho+A%5ETB%28z%5E%7Bk%2B1%7D+-+z%5Ek%29 。
我们希望:
https://www.zhihu.com/equation?tex=s%5E%7Bk%2B1%7D+%5Cle+%5Cepsilon%5E%7Bdual%7D+%5Ctag%7B6b%7D+%5C%5C
由此(6a)、(6b)是终止条件,保证当前迭代点尽可能接近最优性条件(4a)、(4b)、(4c)。
Extenstions
(2)、(3)是Standard ADMM的两种形式。基于Standard ADMM有非常多的扩展,这里就列举两个例子。
可变的惩罚参数(Varying Penalty Parameter)
之前讨论的惩罚系数 https://www.zhihu.com/equation?tex=%5Crho+ 都是常数。论文中给了一个变化的惩罚参数:
一般性的增广项(General Argumenting Terms)
(2)的增广项为,现在可以用更一般的形式来代替:https://www.zhihu.com/equation?tex=%5Cfrac%7B%5Crho%7D%7B2%7D+r%5ETPr。其中P是自己设置的正定矩阵。当P=I是,就是。
Technique
ADMM算法迭代的时候也有很多小技巧。
非精确最小化(Inexact Minimization)
(2a)、(2b)计算最优解的时候,不用精确搜索,可以做一个非精确搜索,找一个次优解,这样会减少迭代次数。
热启动(Warm Start)
(2a)、(2b)迭代的时候,不用每次都从0点或者一个固定点开始迭代,每次迭代可以从上一次位置开始。
读书笔记小结
这篇笔记中讨论了ADMM算法的两种形式、最优性条件、终止准则。之后也补充了一些ADMM算法的扩展与技巧。
这些都是基于(1)这个问题的。(1)这个问题只有等式约束,如果遇到不等式约束怎么办,下一篇笔记会引入近端算子,来处理含不等式约束或者集合约束这一类问题。 韩德仁老师最近写了新的综述可以参考下,从DRS理解ADMM可以看18年开的一个会议,油管能搜到,汇报的都是这个领域活跃者[飙泪笑] 嗯嗯,感谢提醒
页:
[1]