[优化]Levenberg-Marquardt 最小二乘优化

DungDaj · 发表于 2021-12-15 07:06

LM(Levenberg-Marquardt)算法属于信赖域法，将变量行走的长度控制在一定的信赖域之内，保证泰勒展开有很好的近似效果。
LM算法使用了一种带阻尼的高斯－牛顿方法。

１．理论

最小二乘问题

$x=arg\min_{x} {F(x)}=arg\min_{x} \frac{1}{2} \sum_{i=1}^N\parallel f(x) \parallel ^2 \\ F(x)=\frac{1}{2} \sum_{i=1}^N\parallel f(x) \parallel ^2=\frac{1}{2}\parallel\mathbf{f}(x)\parallel^2=\frac{1}{2}\mathbf{f}(x)^T\mathbf{f}(x)$
将 $\mathbf{f}$ 一阶泰勒展开：

$\mathbf{f}(x+h)=\mathbf{f}(x)+\mathbf{J}(x) h + O(h^Th) \\$
去掉高阶项，带入到 $F$ :

$F(x+h)\approx L(h)=\frac{1}{2}\mathbf{f}^T\mathbf{f}+h^T\mathbf{J}\mathbf{f}+\frac{1}{2}h^T\mathbf{J}^T\mathbf{J}h \\$
阻尼法的的思想是再加入一个阻尼项 $\frac{1}{2}\mu h^Th$ ：

$h=arg\min_h {\mathbf{G}(h)} = arg\min_h \frac{1}{2}\mathbf{f}^T\mathbf{f}+h^T\mathbf{J}^T\mathbf{f}+\frac{1}{2}h^T\mathbf{J}^T\mathbf{J}h + \frac{1}{2}\mu h^Th\\$
对上式求偏导数，并令为０．

$\mathbf{J}^Tf+\mathbf{J}^T\mathbf{J}h+\mu h= 0 \\ (\mathbf{J}^T\mathbf{J}+\mu \mathbf{I})h=-\mathbf{J}^Tf\\ (\mathbf{H}+\mu \mathbf{I} )h=-g$

阻尼参数的作用有：

1. 对与 $\mu>0$ ， $(\mathbf{H}+\mu \mathbf{I} )$ 正定，保证了  是梯度下降的方向。
2. 当  较大时： $h\approx-\frac{1}{\mu}g=-\frac{1}{\mu}F'(x)$ ，其实就是梯度、最速下降法，当离最终结果较远的时候，很好。
3. 当  较小时，方法接近与高斯牛顿，当离最终结果很近时，可以获得二次收敛速度，很好。

看来，的选取很重要。初始时，取

$\mathbf{A}_0=\mathbf{J}(x_0)^T\mathbf{J}(x_0)\\ \mu_0=\tau \cdot \max_i\{a_{ii}^0\}$
其他情况，利用cost增益来确定:

$\varrho=\frac{F(x)-f(x+h)}{ L(0)-L(h)} \\$
迭代终止条件

１．一阶导数为０： $F'(x)=g(x)=0$ ，使用 $\parallel g(x) \parallel < \varepsilon_1$ , $\varepsilon_1$ 是设定的终止条件；
２．ｘ变化步距离足够小， $\parallel h\parallel=\parallel x_{new}-x \parallel < \varepsilon_2( \parallel x \parallel + \varepsilon_2 )$ ；
３．超过最大迭代次数。

LM算法的步骤为

begin
$k:=0,\ \nu:=2;\ x:=x_0$
$H:=J^TJ;\ g:=J^Tf$
$found=(\parallel g \parallel _\infty < \varepsilon_1);\ \mu:=\tau\cdot max\{a_{ii}\}$
while(not found) and k < kmax
$k:= k +1; Solve(H+\mu I) h = -g$
      if( $\parallel h\parallel< \varepsilon_2( \parallel x \parallel + \varepsilon_2 )$
         found = true;
      else
$x_{new}:= x+h$
$\varrho=\frac{F(x)-f(x+h)}{ L(0)-L(h)}$
         if( $\varrho>0$ ) {判断能不能接收这一步}
$x:=x_{new}$
$H=J^TJ; \ g:=J^Tf$
$found = (\parallel g \parallel_\infty < \varepsilon_1)$
$\mu:=\mu \cdot max\{\frac13, 1-(2\varrho-1)^3 \};\ \nu=2$
         else
$\mu:=\mu \cdot \nu; \nu = 2 \cdot \nu$

end

2. 算法实现

问题：（高斯牛顿同款问题）非线性方程: $y=exp(ax^2+bx+c)$ ，给定 $N$ 组观测数据 $\{x_i,y_i\}$ ，求系数 $X=[ a,b,c ]^T$ .
分析：令 $f(X)=y-exp(ax^2+bx+c)$ ,N组数据可以组成一个大的非线性方程组

$F(X)=\left[\begin{array}[c]{c} y_1-exp(ax_1^2+bx_1+c)\\ \dots\\ y_N-exp(ax_N^2+bx_N+c)\end{array} \right]$
我们可以构建一个最小二乘问题：

$x = \mathrm{arg}\min_{x}\frac{1}{2}\parallel F(X) \parallel^2$ .
要求解这个问题，根据推导部分可知，需要求解雅克比。
$J(X)=\left[\begin{matrix} -x_1^2exp(ax_1^2+bx_1+c) & -x_1exp(ax_1^2+bx_1+c) &-exp(ax_1^2+bx_1+c) \\ \dots & \dots & \dots \\ -x_N^2exp(ax_N^2+bx_N+c) & -x_Nexp(ax_N^2+bx_N+c) &-exp(ax_N^2+bx_N+c) \end{matrix} \right]$

使用推导部分所述的步骤就可以进行解算。代码实现：
/**
* This file is part of LevenbergMarquardt Solver.
*
* Copyright (C) 2018-2020 Dongsheng Yang <ydsf16@buaa.edu.cn> (Beihang University)
* For more information see <https://github.com/ydsf16/LevenbergMarquardt>
*
* LevenbergMarquardt is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* LevenbergMarquardt is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with LevenbergMarquardt. If not, see <http://www.gnu.org/licenses/>.
*/

#include <iostream>
#include <eigen3/Eigen/Core>
#include <eigen3/Eigen/Dense>
#include <opencv2/opencv.hpp>
#include <eigen3/Eigen/Cholesky>
#include <chrono>

/* 计时类 */
class Runtimer{
public:
inline void start()
{
      t_s_  = std::chrono::steady_clock::now();
}

inline void stop()
{
      t_e_ = std::chrono::steady_clock::now();
}

inline double duration()
{
      return std::chrono::duration_cast<std::chrono::duration<double>>(t_e_ - t_s_).count() * 1000.0;
}

private:
std::chrono::steady_clock::time_point t_s_; //start time ponit
std::chrono::steady_clock::time_point t_e_; //stop time point
};

/*  优化方程 */
class LevenbergMarquardt{
public:
LevenbergMarquardt(double* a, double* b, double* c):
a_(a), b_(b), c_(c)
{
      epsilon_1_ = 1e-6;
      epsilon_2_ = 1e-6;
      max_iter_ = 50;
      is_out_ = true;
}

void setParameters(double epsilon_1, double epsilon_2, int max_iter, bool is_out)
{
      epsilon_1_ = epsilon_1;
      epsilon_2_ = epsilon_2;
      max_iter_ = max_iter;
      is_out_ = is_out;
}

void addObservation(const double& x, const double& y)
{
      obs_.push_back(Eigen::Vector2d(x, y));
}

void calcJ_fx()
{
      J_ .resize(obs_.size(), 3);
      fx_.resize(obs_.size(), 1);

      for ( size_t i = 0; i < obs_.size(); i ++)
      {
         const Eigen::Vector2d& ob = obs_.at(i);
         const double& x = ob(0);
         const double& y = ob(1);
         double j1 = -x*x*exp(*a_ * x*x + *b_*x + *c_);
         double j2 = -x*exp(*a_ * x*x + *b_*x + *c_);
         double j3 = -exp(*a_ * x*x + *b_*x + *c_);
         J_(i, 0 ) = j1;
         J_(i, 1) = j2;
         J_(i, 2) = j3;
         fx_(i, 0) = y - exp( *a_ *x*x + *b_*x +*c_);
      }
}

void calcH_g()
{
      H_ = J_.transpose() * J_;
      g_ = -J_.transpose() * fx_;
}

double getCost()
{
      Eigen::MatrixXd cost= fx_.transpose() * fx_;
      return cost(0,0);
}

double F(double a, double b, double c)
{
      Eigen::MatrixXd fx;
      fx.resize(obs_.size(), 1);

      for ( size_t i = 0; i < obs_.size(); i ++)
      {
         const Eigen::Vector2d& ob = obs_.at(i);
         const double& x = ob(0);
         const double& y = ob(1);
         fx(i, 0) = y - exp( a *x*x + b*x +c);
      }
      Eigen::MatrixXd F = 0.5 * fx.transpose() * fx;
      return F(0,0);
}

double L0_L( Eigen::Vector3d& h)
{
         Eigen::MatrixXd L = -h.transpose() * J_.transpose() * fx_ - 0.5 * h.transpose() * J_.transpose() * J_ * h;
         return L(0,0);
}

void solve()
{
      int k = 0;
      double nu = 2.0;
      calcJ_fx();
      calcH_g();
      bool found = ( g_.lpNorm<Eigen::Infinity>() < epsilon_1_ );

      std::vector<double> A;
      A.push_back( H_(0, 0) );
      A.push_back( H_(1, 1) );
      A.push_back( H_(2,2) );
      auto max_p = std::max_element(A.begin(), A.end());
      double mu = *max_p;

      double sumt =0;

      while ( !found && k < max_iter_)
      {
         Runtimer t;
         t.start();

         k = k +1;
         Eigen::Matrix3d G = H_ + mu * Eigen::Matrix3d::Identity();
         Eigen::Vector3d h = G.ldlt().solve(g_);

         if( h.norm() <= epsilon_2_ * ( sqrt(*a_**a_ + *b_**b_ + *c_**c_ ) +epsilon_2_ ) )
            found = true;
         else
         {
            double na = *a_ + h(0);
            double nb = *b_ + h(1);
            double nc = *c_ + h(2);

            double rho =( F(*a_, *b_, *c_) - F(na, nb, nc) )  / L0_L(h);

            if( rho > 0)
            {
                  *a_ = na;
                  *b_ = nb;
                  *c_ = nc;
                  calcJ_fx();
                  calcH_g();

                  found = ( g_.lpNorm<Eigen::Infinity>() < epsilon_1_ );
                  mu = mu * std::max<double>(0.33, 1 - std::pow(2*rho -1, 3));
                  nu = 2.0;
            }
            else
            {
                  mu = mu * nu;
                  nu = 2*nu;
            }// if rho > 0
         }// if step is too small

         t.stop();
         if( is_out_ )
         {
            std::cout << &#34;Iter: &#34; << std::left <<std::setw(3) << k << &#34; Result: &#34;<< std::left <<std::setw(10)  << *a_ << &#34; &#34; << std::left <<std::setw(10)  << *b_ << &#34; &#34; << std::left <<std::setw(10) << *c_ <<
            &#34; step: &#34; << std::left <<std::setw(14) << h.norm() << &#34; cost: &#34;<< std::left <<std::setw(14)  << getCost() << &#34; time: &#34; << std::left <<std::setw(14) << t.duration()  <<
            &#34; total_time: &#34;<< std::left <<std::setw(14) << (sumt += t.duration()) << std::endl;
         }
      } // while

      if( found  == true)
         std::cout << &#34;\nConverged\n\n&#34;;
      else
         std::cout << &#34;\nDiverged\n\n&#34;;

}//function

Eigen::MatrixXd fx_;
Eigen::MatrixXd J_; // 雅克比矩阵
Eigen::Matrix3d H_; // H矩阵
Eigen::Vector3d g_;

std::vector< Eigen::Vector2d> obs_; // 观测

/* 要求的三个参数 */
double* a_, *b_, *c_;

/* parameters */
double epsilon_1_, epsilon_2_;
int max_iter_;
bool is_out_;
};//class LevenbergMarquardt
int main(int argc, char **argv) {
const double aa = 0.1, bb = 0.5, cc = 2; // 实际方程的参数
double a =0.0, b=0.0, c=0.0; // 初值

/* 构造问题 */
LevenbergMarquardt lm(&a, &b, &c);
lm.setParameters(1e-10, 1e-10, 100, true);

/* 制造数据 */
const size_t N = 100; //数据个数
cv::RNG rng(cv::getTickCount());
for( size_t i = 0; i < N; i ++)
{
      /* 生产带有高斯噪声的数据　*/
      double x = rng.uniform(0.0, 1.0) ;
      double y = exp(aa*x*x + bb*x + cc) + rng.gaussian(0.05);

      /* 添加到观测中　*/
      lm.addObservation(x, y);
}
/* 用LM法求解 */
lm.solve();

return 0;
}

ChuanXin · 发表于 2021-12-15 07:12

最开始的最小二乘法那里，求和参数为i，但是数学表达式里面却不含i

mypro334 · 发表于 2021-12-15 07:21

接受新的一步后，\mu值的更新的依据是什么，或者针对这个问题有什么好的参考？

franciscochonge · 发表于 2021-12-15 07:25

下面用的黑体，表示矩阵

mastertravels77 · 发表于 2021-12-15 07:31

你好，公式应该是这样F(x+h)=L(h)=0.5fTf + hTjTf + 0.5*hTjTjh，你公式里是hTjf，少了个转置，代码中没少

acecase · 发表于 2021-12-15 07:39

确实，公式有此问题

zifa2003293 · 发表于 2021-12-15 07:41

请问一下cost增益是什么原理？搜索了一下没搜到这方面的资料

ainatipen · 发表于 2021-12-15 07:49

damping factor的初始值选取为什么是列出的那个方程，是什么原理呢

kyuskoj · 发表于 2021-12-15 07:59

http://www2.imm.dtu.dk/documents/ftp/tr99/tr05_99.pdf可以读读这个

LiteralliJeff · 发表于 2021-12-15 08:00

这个推导里面为何导数为0的点就是极小值呀，还有可能时极大值或者鞍点呢

		自动登录	找回密码
密码			立即注册

[优化]Levenberg-Marquardt 最小二乘优化

浏览过的版块