各种数值最优化方法总结及其实例

BlaXuan · 发表于 2022-11-9 19:41

这几年的学习中，我偶尔会碰到一些比较复杂的、变量比较多的最优化求解问题；简单的最优化求解问题当然可以套用运筹学或凸优化的一些方法，但是往往很多问题是无法求出解析解的，只能用一些数值方法作近似求解；我觉得有必要总结一下自己的学习历程，以便日后查看。
最著名的数值优化方法当然是一系列的智能算法，以及因为近年机器学习的兴起而带动发展起来的、专门用于减轻调参难度的贝叶斯优化方法。本文会根据我自己的经验和认识总结一下四种算法：遗传算法、粒子群算法、模拟退火算法和贝叶斯优化。
数值最优化方法的共同点

数值最优化方法本质是用一些迭代算法去找到整个函数或一些实际问题的近似可能的最优解和最优函数值，而这个过程可以通俗理解为“先进的、有目的的随机搜索”。用数学式子来表示就是：
x_1^*,x_2^*,\cdots,x_n^*= \arg max  f(x_1,x_2,\cdots,x_n)
这个过程会经历一些不断试错的过程，中途可能会产生许多“比较可能”是最优的解（即局部最优解）；我们的算法要想方设法跳脱局部最优解，尽可能找到全局最优解。
故综合来看，数值最优化方法的理论有两个关键点：
1，按照一定的规则产生新的可行解x；产生的解x必须要先符合约束条件，再去研究它是否为最优解。
2，产生的新解x后要制定一套详细的规则决定是接受还是拒绝它。
因为当前的解 a 可能是局部最优解，使得新解 b有：f(b)<f(a)；如果就此停止迭代，可能找到的就不是全局最优解。这涉及到一个搜索策略的取舍问题：保守&冒险。保守的策略倾向于根据已知情况寻找最优解；比如我已经知道a附近的取值 f(a+\Delta x) 会越来越大，那我就尽量在这附近搜索。冒险的策略则是更愿意去搜索之前没有探索过的区域，这更可能发现全局最优解，跳出局部最优解；但是消耗时间会更长。
编程上的两个关键点：
1、如何产生新解，很大程度上影响了最后找到的最优解的质量；
2，要明确目标函数和变量的关系，准确表述出来。
这里有一个编程小技巧，如果自己手写优化算法，可以将约束条件以惩罚函数的形式巧妙地融合在目标函数中。比如需要让 x_1^2+x_2 \leq 3 ，则目标函数可以写成： f(x)+100000 \max( x_1^2+x_2 -3,0) ，只要满足约束，惩罚项会无限小；否则，无限大。
智能算法理论回顾

智能算法本质上通过对动物群体行为的研究，将其抽象成寻找最优解的过程。整个过程又被称为“启发式搜索”，即在搜索最优解的过程中利用到了原来搜索过程中得到的信息，且这个信息会改进我们的搜索过程。它虽然不能保证能找到全局最优解，但是能更接近全局最优，且能加快求解过程。
粒子群算法

核心思想是利用群体中的个体对信息的共享，使整个群体的运动在问题求解空间中产生从无序到有序的演化过程，从而获得问题的可行解。

这时，所有的鸟都会综合个体信息（自己所掌握的信息）和群体信息，对自己下一步的飞行速度做出决定：

我们可以将这个过程和最优解的求解术语结合起来：
各个粒子就是优化问题的候选解（还要查看是否符合约束条件才能成为可行解），借助位置和速度的概念可以产生新的候选解；适应度（即函数最优解）评价粒子的优劣；而个体和群体分别有适应度；每一轮中，当个体的适应度优于前一轮的群体适应度，则当前轮次的适应度会被替换掉。
总结：粒子群算法中的多个粒子表明它每一次迭代同时产生了多个可行解供我们选择（类似于遗传算法，又不同于模拟退火算法），它是通过这种方式尽量避免陷入局部最优的。
算法流程：

模拟退火算法

退火是一种金属热处理工艺，指的是将金属缓慢加热到一定温度，保持足够时间，然后以适宜速度冷却。
对于模拟退火算法，有两个关键点：
1、假设我们将搜索过程看作一个“工序”，那么搜索前期我们搜索的范围应该尽量大，搜索后期我们搜索的范围应该尽量小的。
2、对于当前解a和找到的新解b，若f(b)<f(a)，我们仍然要以一定概率接受它。
上面的两点是通过设置一个概率式子实现的：
P_t=\exp(-C_t|f(b)-f(a)|)
C_t  可以看作一个和时间有关的系数，时间越长，它越大；因为我们希望前期搜索范围大，那么p应该大些，那么 C_t 应该小些；后期则相反。

注意：不同于粒子群算法和遗传算法，模拟退火算法并没有给出产生新解的方法，我们需要自己根据情况设定。
算法流程：

模拟退火算法手写代码

例子：

首先要明确目标函数：使得总费用最小，因此需要写一个计算费用的函数，其中solution是一个1x20的向量，每一个元素代表该书应该去哪家书店购买；book_price 和transportation 分别存储书费和运费。
function fee=Caculate_Fee(solution,book_price,transportation)
%book_price :15 stores x  20 books
store_index=unique(solution);
transportation_fee=sum(transportation(store_index));
book_fee=0;
for j=1:size(book_price,2)
book_fee=book_fee+book_price(solution(j),j);
end
fee=book_fee+transportation_fee;
end接着，需要一个产生新解的函数；这里产生新解主要有三种方法，比较好理解，但是效果不一定非常好；至于每种方法的使用概率，可以根据需要自己调：
%利用三种方式，根据当前解随机生成新解
function new_solution=GenerateNewSolution(old_solution,stores_num,books_num)
new_solution=old_solution;
%生成一个随机数决定用哪种生成方式
choose_way=unifrnd(0,1);
%生成一些随机数
index=randperm(books_num);
if choose_way<= 1/15
index=sort(index(1:2));
%随机选择两个点交换位置
[location1,location2]=deal(index(1),index(2));
temp=new_solution(location1);
new_solution(location1)=new_solution(location2);
new_solution(location2)=temp;
elseif choose_way >(2/15) && choose_way <=(8/15)
index=sort(index(1:2));
%随机选择两个点,将这两点之间的顺序颠倒
[location1,location2]=deal(index(1),index(2));
slice=old_solution(location1:location2);
slice_back=slice(end:-1:1);
new_solution(location1:location2)=slice_back;
else
index=sort(index(1:3));
%随机选三个点，将前两个点之间的点移动到第三个点后
[location1,location2,location3]=deal(index(1),index(2),index(3));
slice=old_solution(location1:location2);
if location3 ~= books_num
      slice2=old_solution(location3+1:end);
      %删除
      new_solution([location1:location2,location3+1:books_num])=[];
      %重新拼接
      new_solution=[new_solution,slice,slice2];
else
      %删除
      new_solution(location1:location2)=[];
      %重新拼接
      new_solution=[new_solution,slice];
end
end
end最后是主函数：
%模拟退火寻找最优买书方案
clear all;
data=xlsread(&#34;buying_books_opt.xlsx&#34;,&#34;B2:V16&#34;);
books_price=data(:,1:end-1);
transportation=data(:,end);
[stores_num,books_num]=size(books_price);

%参数初始化
T0=2500;%初始温度
max_iter=8000;%最大迭代次数
T=T0;%迭代时的温度
alpha=0.9;%温度衰减系数
T_iter=1800;%每个温度下的迭代次数

%随机生成一个初始解
solution0=randi(stores_num,[1,books_num]);
cost0=Caculate_Fee(solution0,books_price,transportation);

%保存中间过程的变量
min_cost=cost0;%当前最小值
cost_result=zeros(max_iter,1);%记录外层循环找到的Min_cost
solution_result=zeros(max_iter,books_num);
best_solution=solution0;

%模拟退火主过程
for iter=1:max_iter
for t=1:T_iter
      NewSolution=GenerateNewSolution(solution0,stores_num,books_num);
      NewCost=Caculate_Fee(NewSolution,books_price,transportation);
      %如果新的花费更低，则更新；否则以一定概率更新
      if NewCost < cost0
         solution0=NewSolution;
         cost0=NewCost;
      else
         p=exp(-(NewCost-cost0)/T);
         if rand(1)<p
            solution0=NewSolution;
            cost0=NewCost;
         end
      end
      %判断是否更新找到的最优解
      if cost0< min_cost
         min_cost=cost0;
         best_solution=solution0;
      end
end
cost_result(iter,1)=min_cost;
solution_result(iter,:)=best_solution;
T=alpha*T;
end

disp(&#34;最佳方案是去以下书店依次购买对应书籍：&#34;);disp(best_solution)
disp(&#34;花费最少：&#34;);disp(min_cost)遗传算法

过程涉及到的术语很多，不过大体思想还是大同小异。同粒子群算法一样，遗传算法每迭代一次会产生大量群体（染色体）供我们进行选择，通过这种方式来尽可能避免局部最优解；通过交配、变异等方式会产生新的可行解，这些可行解可能可以帮助我们跳出局部最优解。
过程不再赘述。
智能算法调包实例

matlab内置了很多智能算法的函数，我们直接调用即可。
1，粒子群算法：[index1,fval1]=particleswarm(obj_fun,narvs,lb,ub,options);
其中index1为返回的最优解X，fval1为找到的最优值；obj_fun为需要优化的目标函数，narvs：要优化的变量数量;lb、ub：X的边界约束。
2，遗传算法： [index2,fval2]=ga(obj_fun,narvs,A,b,Aeq,beq,lb,ub);
不等式约束：A*x<=b  ；等式约束：Aeq*x=beq
3，模拟退火算法：[index3,fval3]=simulannealbnd(obj_fun,x0,lb,ub);
x0：初始值
书店买书的例题：

先设定好目标函数：（我直接用每个算法跑出了300个解，然后存储起来找最好的解）
function cost=cal_fee(store_index,data)
store_index=round(store_index);
book_price=data(:,1:end-1);
transportation=data(:,end);
book_cost=0;
for i=1:length(store_index)
book_cost=book_cost+book_price(store_index(i),i);
end
store=unique(store_index);
transportation_cost=sum(transportation(store));
cost=book_cost+transportation_cost;
end随后是一些基础设置：注意obj_fun要写成函数句柄的形式，@后面放的是需要优化的变量
narvs=20;
lb=ones(1,20);
ub=15*ones(1,20);
data=xlsread(&#34;buying_books_opt.xlsx&#34;,&#34;B2:V16&#34;);
obj_fun=@(store_index) cal_fee(store_index,data);主函数：
options=optimoptions(&#34;particleswarm&#34;,&#34;HybridFcn&#34;,@fmincon);

obj_array1=[];
input_array1=[];
obj_array2=[];
input_array2=[];
obj_array3=[];
input_array3=[];

times=300
for time=1:times
%初始值
if time==1
      x0=randi(20,[1,20]);
else
      x0=index3;
end
%粒子群算法
[index1,fval1]=particleswarm(obj_fun,narvs,lb,ub,options);
%遗传算法
[index2,fval2]=ga(obj_fun,narvs,[],[],[],[],lb,ub);
%模拟退火
[index3,fval3]=simulannealbnd(obj_fun,x0,lb,ub);

input_array1(time,:)=index1;
input_array2(time,:)=index2;
obj_array1(time)=fval1;
obj_array2(time)=fval2;
input_array3(time,:)=index3;

obj_array3(time)=fval3;

end
min(obj_array1)
min(obj_array2)
min(obj_array3)

料场选址

第一问比较简单，主要算出分别向每个工地运输多少水泥即可：
计算距离的函数：
%c_ij i=1,2 j=1,2,^,6  从i料场向j工地运输c_ij吨水泥
%c=[c11,c12,^,c16,c21,^c26]
function cost=cal_dist(c)
%保留三位小数，精确到KG
c=roundn(c,-2);
c=reshape(c,6,2)&#39;;
location1=[1.25,1.25;8.75,0.75;0.5,4.75;5.75,5;3,6.5;7.25,7.75];
location2=[5,1;2,7];
[minus1,minus2]=deal(location1-location2(1,:),location1-location2(2,:));
%到第一个料场的距离
dist1=bsxfun(@hypot,minus1(:,1),minus1(:,2));
%到第二个料场的距离
dist2=bsxfun(@hypot,minus2(:,1),minus2(:,2));
%从P运输水泥的总距离
Pdist=c(1,:)*dist1;
%从Q
Qdist=c(2,:)*dist2;
total_dist=Pdist+Qdist;
cost=total_dist;
end
主函数：（注意约束函数的书写，最好在草稿纸上写成矩阵的形式，不然很容易出错）
%c_ij i=1,2 j=1,2,^,6  从i料场向j工地运输c_ij吨水泥
%c=[c11,c12,^,c16,c21,^c26]
function cost=cal_dist(c)
%保留三位小数，精确到KG
c=roundn(c,-2);
c=reshape(c,6,2)&#39;;
location1=[1.25,1.25;8.75,0.75;0.5,4.75;5.75,5;3,6.5;7.25,7.75];
location2=[5,1;2,7];
[minus1,minus2]=deal(location1-location2(1,:),location1-location2(2,:));
%到第一个料场的距离
dist1=bsxfun(@hypot,minus1(:,1),minus1(:,2));
%到第二个料场的距离
dist2=bsxfun(@hypot,minus2(:,1),minus2(:,2));
%从P运输水泥的总距离
Pdist=c(1,:)*dist1;
%从Q
Qdist=c(2,:)*dist2;
total_dist=Pdist+Qdist;
cost=total_dist;
end

第二问：
计算距离的函数：
%c_ij i=1,2 j=1,2,^,6  从i料场向j工地运输c_ij吨水泥
%最后两个参数是供应商坐标
%var=[c11 c12 c13,^,c25,c26,x1,y1,x2,y2]
function cost=cal_dist3(var)
%保留三位小数，精确到KG
var=roundn(var,-2);
location2=var(end-3:end);
location2=reshape(location2,2,2);%注意reshape后形式的排布
location2=location2&#39;;
c=var(1:end-4);
c=reshape(c,6,2);
c=c&#39;;%(2,6)
location1=[1.25,1.25;8.75,0.75;0.5,4.75;5.75,5;3,6.5;7.25,7.75];

[minus1,minus2]=deal(location1-location2(1,:),location1-location2(2,:));
%到第一个料场的距离
dist1=bsxfun(@hypot,minus1(:,1),minus1(:,2));
%到第二个料场的距离
dist2=bsxfun(@hypot,minus2(:,1),minus2(:,2));
%从P运输水泥的总距离
Pdist=c(1,:)*dist1;
%从Q
Qdist=c(2,:)*dist2;
total_dist=Pdist+Qdist;
cost=total_dist;
end
主函数：
%运输水泥成本计算
%加选最优场地
narvs=16;
lb=zeros(1,16);
ub=20*ones(1,16);
obj_fun=@(c) cal_dist3(c);
options=optimoptions(&#34;ga&#34;,&#34;HybridFcn&#34;,@fmincon,&#34;PopulationSize&#34;,200);
A=[ones(1,6),zeros(1,10) ;zeros(1,6),ones(1,6),zeros(1,4)];
supply=[20;20];
Aeq=[eye(6),eye(6),zeros(6,4)];
demand=[3 5 4 7 6 11]&#39;;
c_array1=[];

times=5;
cost_array1=zeros(times,16);

for i=1:times

[c_min,fval]=ga(obj_fun,narvs,A,supply,Aeq,demand,lb,ub,[],[],options);

c_array1(i)=fval;
cost_array1(i,:)=c_min;
end
min(c_array1)

[min_num,min_index]=min(c_array1);
input=cost_array1(min_index,:);
input=roundn(input,-2);
loc=input(end-3:end);
loc=reshape(loc,2,2)&#39;;
c=input(1:end-4);
c=reshape(c,6,2)&#39;;
constrain1=sum(c,2)-supply;
constrain2=sum(c,1)-demand&#39;;
judge1=(constrain1<=20);
judge2=(constrain2==0);调包总结

如果按调包的情况来看，是遗传算法的效果最好；当然很多细节还是可以调整的，需要根据具体情况而定。
贝叶斯优化

近年来，深度学习愈发火爆，让其在各领域的应用越来越广；但是，深度学习模型的超参数非常的多，让我们在设置和应用模型时困难重重。因此，AutoML提出了一种方便我们调参的贝叶斯优化算法。
同样道理，我们可以将所有超参数当成输入变量X，需要优化的变量比如模型准确率、模型错选率当做模型的输出Y，我们假设超参数和需要优化的变量之间有一个隐含的、确定的函数关系： Y=f^*(X) ；贝叶斯优化利用高斯过程回归的方法根据选取的不同超参数和模型输出结果拟合出一个近似的函数： Y=f(X) ，希望随着采样的不断增多f(X)能趋近于 f^*(X) 。至于如何采样，贝叶斯优化算法有很多配套的方法，不再赘述。
其实贝叶斯优化方法本质上和智能算法的思想十分相近，都是一种“启发式优化”的思想，只不过它被更专注于应用在机器学习的调参领域，希望选择一些最优超参数，使得模型表现最好。
贝叶斯优化例子：神经网络调参

我们希望神经网络做回归任务，需要选择一些超参数，使得预测结果和真实结果的R-squre最小
from bayes_opt import BayesianOptimization
import tensorflow as tf

def nn_model(l1_rate,l2_rate,dropout_rate):
model = tf.keras.models.Sequential([
      tf.keras.layers.Dense(100, activation=&#39;relu&#39;, input_shape=x_train.shape[1:]),
      tf.keras.layers.BatchNormalization(),

      tf.keras.layers.Dense(100, activation=&#39;relu&#39;,kernel_regularizer=tf.keras.regularizers.l1_l2(l1=l1_rate,l2=l2_rate) ),
      tf.keras.layers.BatchNormalization(),
      tf.keras.layers.Dropout(dropout_rate),
      tf.keras.layers.Dense(80, activation=&#39;relu&#39;, kernel_regularizer=tf.keras.regularizers.l1_l2(l1=l1_rate,l2=l2_rate)),
      tf.keras.layers.Dropout(dropout_rate),

      tf.keras.layers.Dense(50, activation=&#39;relu&#39;, ),
      tf.keras.layers.BatchNormalization(),

      tf.keras.layers.Dense(y_train.shape[1], activation=&#39;sigmoid&#39;),
])
return model

def black_box_function(batch_size,lr,epochs,l1_rate,l2_rate,dropout_rate):

model=nn_model(l1_rate=l1_rate,l2_rate=l2_rate,dropout_rate=dropout_rate)
opt = tf.keras.optimizers.Adagrad(learning_rate=lr)

model.compile(loss=customed_mse,metrics=customed_mse,optimizer=opt)

history = model.fit(x_train_scaled,y_train_scaled, shuffle=False, workers=1,
                     validation_data=(x_val_scaled, y_val_scaled),verbose=False,
                     epochs=round(epochs), batch_size=round(batch_size))

pred=model.predict(x_val_scaled)

score=adjusted_r2score(y_val,pred,y_val_scaled)
score_sort=np.sort(np.array(score))

return score_sort[0]+score_sort[1]+score_sort[2]

# 贝叶斯优化调参

optimizer = BayesianOptimization(f=black_box_function, pbounds=pbounds, verbose=2, random_state=1)
optimizer.maximize(n_iter=30)
print(optimizer.max)

# 预测，用最好的参数再次构建模型

epochs = round(optimizer.max[&#39;params&#39;][&#39;epochs&#39;])
batch_size = round(optimizer.max[&#39;params&#39;][&#39;batch_size&#39;])
lr = optimizer.max[&#39;params&#39;][&#39;lr&#39;]
l1_rate=optimizer.max[&#39;params&#39;][&#39;l1_rate&#39;]
l2_rate=optimizer.max[&#39;params&#39;][&#39;l2_rate&#39;]
dropout_rate=optimizer.max[&#39;params&#39;][&#39;dropout_rate&#39;]

opt = tf.keras.optimizers.Adagrad(learning_rate=lr)
model1=nn_model(l1_rate=l1_rate,l2_rate=l2_rate,dropout_rate=dropout_rate)
model1.compile(loss=customed_mse, metrics=customed_mse, optimizer=opt)

history = model1.fit(x_train_scaled, y_train_scaled, shuffle=False, workers=1,
                  validation_data=(x_val_scaled, y_val_scaled), verbose=False,
                  epochs=round(epochs), batch_size=round(batch_size))

pred1 = model1.predict(x_train_scaled)
pred2 = model1.predict(x_test_scaled)

		自动登录	找回密码
密码			立即注册

各种数值最优化方法总结及其实例

本帖子中包含更多资源