[优化]Levenberg-Marquardt 最小二乘优化

DungDaj 发表于 2021-12-15 07:06

LM(Levenberg-Marquardt)算法属于信赖域法，将变量行走的长度控制在一定的信赖域之内，保证泰勒展开有很好的近似效果。
LM算法使用了一种带阻尼的高斯－牛顿方法。１．理论

最小二乘问题

https://www.zhihu.com/equation?tex=x%3Darg%5Cmin_%7Bx%7D+%7BF%28x%29%7D%3Darg%5Cmin_%7Bx%7D+%5Cfrac%7B1%7D%7B2%7D+%5Csum_%7Bi%3D1%7D%5EN%5Cparallel+f%28x%29+%5Cparallel+%5E2+%5C%5C+F%28x%29%3D%5Cfrac%7B1%7D%7B2%7D+%5Csum_%7Bi%3D1%7D%5EN%5Cparallel+f%28x%29+%5Cparallel+%5E2%3D%5Cfrac%7B1%7D%7B2%7D%5Cparallel%5Cmathbf%7Bf%7D%28x%29%5Cparallel%5E2%3D%5Cfrac%7B1%7D%7B2%7D%5Cmathbf%7Bf%7D%28x%29%5ET%5Cmathbf%7Bf%7D%28x%29
将 https://www.zhihu.com/equation?tex=%5Cmathbf%7Bf%7D 一阶泰勒展开：

https://www.zhihu.com/equation?tex=%5Cmathbf%7Bf%7D%28x%2Bh%29%3D%5Cmathbf%7Bf%7D%28x%29%2B%5Cmathbf%7BJ%7D%28x%29+h+%2B+O%28h%5ETh%29+%5C%5C+
去掉高阶项，带入到 https://www.zhihu.com/equation?tex=F :

https://www.zhihu.com/equation?tex=F%28x%2Bh%29%5Capprox+L%28h%29%3D%5Cfrac%7B1%7D%7B2%7D%5Cmathbf%7Bf%7D%5ET%5Cmathbf%7Bf%7D%2Bh%5ET%5Cmathbf%7BJ%7D%5Cmathbf%7Bf%7D%2B%5Cfrac%7B1%7D%7B2%7Dh%5ET%5Cmathbf%7BJ%7D%5ET%5Cmathbf%7BJ%7Dh+%5C%5C
阻尼法的的思想是再加入一个阻尼项 https://www.zhihu.com/equation?tex=+%5Cfrac%7B1%7D%7B2%7D%5Cmu+h%5ETh ：

https://www.zhihu.com/equation?tex=h%3Darg%5Cmin_h+%7B%5Cmathbf%7BG%7D%28h%29%7D+%3D+arg%5Cmin_h+%5Cfrac%7B1%7D%7B2%7D%5Cmathbf%7Bf%7D%5ET%5Cmathbf%7Bf%7D%2Bh%5ET%5Cmathbf%7BJ%7D%5ET%5Cmathbf%7Bf%7D%2B%5Cfrac%7B1%7D%7B2%7Dh%5ET%5Cmathbf%7BJ%7D%5ET%5Cmathbf%7BJ%7Dh+%2B+%5Cfrac%7B1%7D%7B2%7D%5Cmu+h%5ETh%5C%5C
对上式求偏导数，并令为０．

https://www.zhihu.com/equation?tex=%5Cmathbf%7BJ%7D%5ETf%2B%5Cmathbf%7BJ%7D%5ET%5Cmathbf%7BJ%7Dh%2B%5Cmu+h%3D+0+%5C%5C+%28%5Cmathbf%7BJ%7D%5ET%5Cmathbf%7BJ%7D%2B%5Cmu+%5Cmathbf%7BI%7D%29h%3D-%5Cmathbf%7BJ%7D%5ETf%5C%5C+%28%5Cmathbf%7BH%7D%2B%5Cmu+%5Cmathbf%7BI%7D+%29h%3D-g

阻尼参数的作用有：
1. 对与 https://www.zhihu.com/equation?tex=%5Cmu%3E0 ， https://www.zhihu.com/equation?tex=%28%5Cmathbf%7BH%7D%2B%5Cmu+%5Cmathbf%7BI%7D+%29 正定，保证了是梯度下降的方向。
2. 当较大时： https://www.zhihu.com/equation?tex=h%5Capprox-%5Cfrac%7B1%7D%7B%5Cmu%7Dg%3D-%5Cfrac%7B1%7D%7B%5Cmu%7DF%27%28x%29 ，其实就是梯度、最速下降法，当离最终结果较远的时候，很好。
3. 当较小时，方法接近与高斯牛顿，当离最终结果很近时，可以获得二次收敛速度，很好。看来，的选取很重要。初始时，取

https://www.zhihu.com/equation?tex=%5Cmathbf%7BA%7D_0%3D%5Cmathbf%7BJ%7D%28x_0%29%5ET%5Cmathbf%7BJ%7D%28x_0%29%5C%5C+%5Cmu_0%3D%5Ctau+%5Ccdot+%5Cmax_i%5C%7Ba_%7Bii%7D%5E0%5C%7D
其他情况，利用cost增益来确定:

https://www.zhihu.com/equation?tex=%5Cvarrho%3D%5Cfrac%7BF%28x%29-f%28x%2Bh%29%7D%7B+L%280%29-L%28h%29%7D+%5C%5C
迭代终止条件
１．一阶导数为０： https://www.zhihu.com/equation?tex=F%27%28x%29%3Dg%28x%29%3D0，使用 https://www.zhihu.com/equation?tex=%5Cparallel+g%28x%29+%5Cparallel+%3C+%5Cvarepsilon_1 , https://www.zhihu.com/equation?tex=+%5Cvarepsilon_1 是设定的终止条件；
２．ｘ变化步距离足够小， https://www.zhihu.com/equation?tex=%5Cparallel+h%5Cparallel%3D%5Cparallel+x_%7Bnew%7D-x+%5Cparallel+%3C+%5Cvarepsilon_2%28+%5Cparallel+x+%5Cparallel+%2B+%5Cvarepsilon_2+%29 ；
３．超过最大迭代次数。LM算法的步骤为
begin
https://www.zhihu.com/equation?tex=k%3A%3D0%2C%5C+%5Cnu%3A%3D2%3B%5C+x%3A%3Dx_0
https://www.zhihu.com/equation?tex=H%3A%3DJ%5ETJ%3B%5C+g%3A%3DJ%5ETf
https://www.zhihu.com/equation?tex=found%3D%28%5Cparallel+g+%5Cparallel+_%5Cinfty+%3C+%5Cvarepsilon_1%29%3B%5C+%5Cmu%3A%3D%5Ctau%5Ccdot+max%5C%7Ba_%7Bii%7D%5C%7D
while(not found) and k < kmax
https://www.zhihu.com/equation?tex=k%3A%3D+k+%2B1%3B+Solve%28H%2B%5Cmu+I%29+h+%3D+-g
   if( https://www.zhihu.com/equation?tex=%5Cparallel+h%5Cparallel%3C+%5Cvarepsilon_2%28+%5Cparallel+x+%5Cparallel+%2B+%5Cvarepsilon_2+%29
         found = true;
   else
https://www.zhihu.com/equation?tex=x_%7Bnew%7D%3A%3D+x%2Bh
https://www.zhihu.com/equation?tex=%5Cvarrho%3D%5Cfrac%7BF%28x%29-f%28x%2Bh%29%7D%7B+L%280%29-L%28h%29%7D+
         if( https://www.zhihu.com/equation?tex=%5Cvarrho%3E0+ ) {判断能不能接收这一步}
https://www.zhihu.com/equation?tex=x%3A%3Dx_%7Bnew%7D
https://www.zhihu.com/equation?tex=H%3DJ%5ETJ%3B+%5C+g%3A%3DJ%5ETf
https://www.zhihu.com/equation?tex=found+%3D+%28%5Cparallel+g+%5Cparallel_%5Cinfty+%3C+%5Cvarepsilon_1%29
https://www.zhihu.com/equation?tex=%5Cmu%3A%3D%5Cmu+%5Ccdot+max%5C%7B%5Cfrac13%2C+1-%282%5Cvarrho-1%29%5E3+%5C%7D%3B%5C+%5Cnu%3D2
         else
https://www.zhihu.com/equation?tex=%5Cmu%3A%3D%5Cmu+%5Ccdot+%5Cnu%3B+%5Cnu+%3D+2+%5Ccdot+%5Cnu end

2. 算法实现

问题：（高斯牛顿同款问题）非线性方程: https://www.zhihu.com/equation?tex=y%3Dexp%28ax%5E2%2Bbx%2Bc%29 ，给定 https://www.zhihu.com/equation?tex=N 组观测数据 https://www.zhihu.com/equation?tex=%5C%7Bx_i%2Cy_i%5C%7D ，求系数 https://www.zhihu.com/equation?tex=X%3D%5B+a%2Cb%2Cc+%5D%5ET .
分析：令 https://www.zhihu.com/equation?tex=f%28X%29%3Dy-exp%28ax%5E2%2Bbx%2Bc%29 ,N组数据可以组成一个大的非线性方程组

https://www.zhihu.com/equation?tex=F%28X%29%3D%5Cleft%5B%5Cbegin%7Barray%7D%5Bc%5D%7Bc%7D+y_1-exp%28ax_1%5E2%2Bbx_1%2Bc%29%5C%5C+%5Cdots%5C%5C+y_N-exp%28ax_N%5E2%2Bbx_N%2Bc%29%5Cend%7Barray%7D+%5Cright%5D
我们可以构建一个最小二乘问题：

https://www.zhihu.com/equation?tex=x+%3D+%5Cmathrm%7Barg%7D%5Cmin_%7Bx%7D%5Cfrac%7B1%7D%7B2%7D%5Cparallel+F%28X%29+%5Cparallel%5E2 .
要求解这个问题，根据推导部分可知，需要求解雅克比。
https://www.zhihu.com/equation?tex=J%28X%29%3D%5Cleft%5B%5Cbegin%7Bmatrix%7D+-x_1%5E2exp%28ax_1%5E2%2Bbx_1%2Bc%29+%26+-x_1exp%28ax_1%5E2%2Bbx_1%2Bc%29+%26-exp%28ax_1%5E2%2Bbx_1%2Bc%29+%5C%5C+%5Cdots+%26+%5Cdots+%26+%5Cdots+%5C%5C+-x_N%5E2exp%28ax_N%5E2%2Bbx_N%2Bc%29+%26+-x_Nexp%28ax_N%5E2%2Bbx_N%2Bc%29+%26-exp%28ax_N%5E2%2Bbx_N%2Bc%29+%5Cend%7Bmatrix%7D+%5Cright%5D

使用推导部分所述的步骤就可以进行解算。代码实现：
/**
* This file is part of LevenbergMarquardt Solver.
*
* Copyright (C) 2018-2020 Dongsheng Yang <ydsf16@buaa.edu.cn> (Beihang University)
* For more information see <https://github.com/ydsf16/LevenbergMarquardt>
*
* LevenbergMarquardt is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* LevenbergMarquardt is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with LevenbergMarquardt. If not, see <http://www.gnu.org/licenses/>.
*/

#include <iostream>
#include <eigen3/Eigen/Core>
#include <eigen3/Eigen/Dense>
#include <opencv2/opencv.hpp>
#include <eigen3/Eigen/Cholesky>
#include <chrono>

/* 计时类 */
class Runtimer{
public:
inline void start()
{
   t_s_= std::chrono::steady_clock::now();
}

inline void stop()
{
   t_e_ = std::chrono::steady_clock::now();
}

inline double duration()
{
   return std::chrono::duration_cast<std::chrono::duration<double>>(t_e_ - t_s_).count() * 1000.0;
}

private:
std::chrono::steady_clock::time_point t_s_; //start time ponit
std::chrono::steady_clock::time_point t_e_; //stop time point
};

/*优化方程 */
class LevenbergMarquardt{
public:
LevenbergMarquardt(double* a, double* b, double* c):
a_(a), b_(b), c_(c)
{
   epsilon_1_ = 1e-6;
   epsilon_2_ = 1e-6;
   max_iter_ = 50;
   is_out_ = true;
}

void setParameters(double epsilon_1, double epsilon_2, int max_iter, bool is_out)
{
   epsilon_1_ = epsilon_1;
   epsilon_2_ = epsilon_2;
   max_iter_ = max_iter;
   is_out_ = is_out;
}

void addObservation(const double& x, const double& y)
{
   obs_.push_back(Eigen::Vector2d(x, y));
}

void calcJ_fx()
{
   J_ .resize(obs_.size(), 3);
   fx_.resize(obs_.size(), 1);

   for ( size_t i = 0; i < obs_.size(); i ++)
   {
         const Eigen::Vector2d& ob = obs_.at(i);
         const double& x = ob(0);
         const double& y = ob(1);
         double j1 = -x*x*exp(*a_ * x*x + *b_*x + *c_);
         double j2 = -x*exp(*a_ * x*x + *b_*x + *c_);
         double j3 = -exp(*a_ * x*x + *b_*x + *c_);
         J_(i, 0 ) = j1;
         J_(i, 1) = j2;
         J_(i, 2) = j3;
         fx_(i, 0) = y - exp( *a_ *x*x + *b_*x +*c_);
   }
}

void calcH_g()
{
   H_ = J_.transpose() * J_;
   g_ = -J_.transpose() * fx_;
}

double getCost()
{
   Eigen::MatrixXd cost= fx_.transpose() * fx_;
   return cost(0,0);
}

double F(double a, double b, double c)
{
   Eigen::MatrixXd fx;
   fx.resize(obs_.size(), 1);

   for ( size_t i = 0; i < obs_.size(); i ++)
   {
         const Eigen::Vector2d& ob = obs_.at(i);
         const double& x = ob(0);
         const double& y = ob(1);
         fx(i, 0) = y - exp( a *x*x + b*x +c);
   }
   Eigen::MatrixXd F = 0.5 * fx.transpose() * fx;
   return F(0,0);
}

double L0_L( Eigen::Vector3d& h)
{
      Eigen::MatrixXd L = -h.transpose() * J_.transpose() * fx_ - 0.5 * h.transpose() * J_.transpose() * J_ * h;
      return L(0,0);
}

void solve()
{
   int k = 0;
   double nu = 2.0;
   calcJ_fx();
   calcH_g();
   bool found = ( g_.lpNorm<Eigen::Infinity>() < epsilon_1_ );

   std::vector<double> A;
   A.push_back( H_(0, 0) );
   A.push_back( H_(1, 1) );
   A.push_back( H_(2,2) );
   auto max_p = std::max_element(A.begin(), A.end());
   double mu = *max_p;

   double sumt =0;

   while ( !found && k < max_iter_)
   {
         Runtimer t;
         t.start();

         k = k +1;
         Eigen::Matrix3d G = H_ + mu * Eigen::Matrix3d::Identity();
         Eigen::Vector3d h = G.ldlt().solve(g_);

         if( h.norm() <= epsilon_2_ * ( sqrt(*a_**a_ + *b_**b_ + *c_**c_ ) +epsilon_2_ ) )
            found = true;
         else
         {
            double na = *a_ + h(0);
            double nb = *b_ + h(1);
            double nc = *c_ + h(2);

            double rho =( F(*a_, *b_, *c_) - F(na, nb, nc) )/ L0_L(h);

            if( rho > 0)
            {
               *a_ = na;
               *b_ = nb;
               *c_ = nc;
               calcJ_fx();
               calcH_g();

               found = ( g_.lpNorm<Eigen::Infinity>() < epsilon_1_ );
               mu = mu * std::max<double>(0.33, 1 - std::pow(2*rho -1, 3));
               nu = 2.0;
            }
            else
            {
               mu = mu * nu;
               nu = 2*nu;
            }// if rho > 0
         }// if step is too small

         t.stop();
         if( is_out_ )
         {
            std::cout << &#34;Iter: &#34; << std::left <<std::setw(3) << k << &#34; Result: &#34;<< std::left <<std::setw(10)<< *a_ << &#34; &#34; << std::left <<std::setw(10)<< *b_ << &#34; &#34; << std::left <<std::setw(10) << *c_ <<
            &#34; step: &#34; << std::left <<std::setw(14) << h.norm() << &#34; cost: &#34;<< std::left <<std::setw(14)<< getCost() << &#34; time: &#34; << std::left <<std::setw(14) << t.duration()<<
            &#34; total_time: &#34;<< std::left <<std::setw(14) << (sumt += t.duration()) << std::endl;
         }
   } // while

   if( found== true)
         std::cout << &#34;\nConverged\n\n&#34;;
   else
         std::cout << &#34;\nDiverged\n\n&#34;;

}//function

Eigen::MatrixXd fx_;
Eigen::MatrixXd J_; // 雅克比矩阵
Eigen::Matrix3d H_; // H矩阵
Eigen::Vector3d g_;

std::vector< Eigen::Vector2d> obs_; // 观测

/* 要求的三个参数 */
double* a_, *b_, *c_;

/* parameters */
double epsilon_1_, epsilon_2_;
int max_iter_;
bool is_out_;
};//class LevenbergMarquardt
int main(int argc, char **argv) {
const double aa = 0.1, bb = 0.5, cc = 2; // 实际方程的参数
double a =0.0, b=0.0, c=0.0; // 初值

/* 构造问题 */
LevenbergMarquardt lm(&a, &b, &c);
lm.setParameters(1e-10, 1e-10, 100, true);

/* 制造数据 */
const size_t N = 100; //数据个数
cv::RNG rng(cv::getTickCount());
for( size_t i = 0; i < N; i ++)
{
   /* 生产带有高斯噪声的数据　*/
   double x = rng.uniform(0.0, 1.0) ;
   double y = exp(aa*x*x + bb*x + cc) + rng.gaussian(0.05);

   /* 添加到观测中　*/
   lm.addObservation(x, y);
}
/* 用LM法求解 */
lm.solve();

return 0;
}

ChuanXin 发表于 2021-12-15 07:12

最开始的最小二乘法那里，求和参数为i，但是数学表达式里面却不含i

mypro334 发表于 2021-12-15 07:21

接受新的一步后，\mu值的更新的依据是什么，或者针对这个问题有什么好的参考？

franciscochonge 发表于 2021-12-15 07:25

下面用的黑体，表示矩阵

mastertravels77 发表于 2021-12-15 07:31

你好，公式应该是这样F(x+h)=L(h)=0.5fTf + hTjTf +0.5*hTjTjh，你公式里是hTjf，少了个转置，代码中没少

acecase 发表于 2021-12-15 07:39

确实，公式有此问题

zifa2003293 发表于 2021-12-15 07:41

请问一下cost增益是什么原理？搜索了一下没搜到这方面的资料

ainatipen 发表于 2021-12-15 07:49

damping factor的初始值选取为什么是列出的那个方程，是什么原理呢

kyuskoj 发表于 2021-12-15 07:59

http://www2.imm.dtu.dk/documents/ftp/tr99/tr05_99.pdf可以读读这个

LiteralliJeff 发表于 2021-12-15 08:00

这个推导里面为何导数为0的点就是极小值呀，还有可能时极大值或者鞍点呢

页: [1] 2

Unity开发者联盟's Archiver

[优化]Levenberg-Marquardt 最小二乘优化