Zephus 发表于 2021-12-18 11:15

《最优化理论》作业1.1:梯度下降法

多元实函数的一阶泰勒展开式

不妨设函数是定义在空间 https://www.zhihu.com/equation?tex=D 的多元实函数,若在点 https://www.zhihu.com/equation?tex=%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D+%5Cin+D 领域内连续且存在偏导数,那么由泰勒近似定理得一阶泰勒展开式:

https://www.zhihu.com/equation?tex=f%28%5Cmathbf%7Bx%7D%29+%5Capprox+p_1%28%5Cmathbf%7Bx%7D%29+%3D+f%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29+%2B+%5B%5Cnabla+f%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29%5D%5E%5Cmathrm%7BT%7D%28%5Cmathbf%7Bx%7D+-%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29+%5Ctag%7B1%7D
可以证明,和在有相同的零阶导和一阶导,但的高阶导均为零。可以说,提取了在处一阶导及以下的所有信息。
梯度下降法

改写公式(1)得:

https://www.zhihu.com/equation?tex=f%28%5Cmathbf%7Bx%7D%29+-+f%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29+%3D+%5B%5Cnabla+f%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29%5D%5E%5Cmathrm%7BT%7D%28%5Cmathbf%7Bx%7D+-%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29+%5Ctag%7B2%7D
梯度下降法得的化指标是让 https://www.zhihu.com/equation?tex=f%28%5Cmathbf%7Bx%7D%29 最小,也就是让公式(2)的左边最小。将公式(2)的右边展开得:

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D++%5Cleft%5B%5Cnabla+f%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29%5Cright%5D%5E%5Cmathrm%7BT%7D%28%5Cmathbf%7Bx%7D+-+%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29+%26%5CLeftrightarrow+%3C%28f_%7Bx_1%7D%2C+f_%7Bx_2%7D%2C+%5Cdots%2C+f_%7Bx_N%7D%29%2C+%28x_1+-+%5Coverset%7B%5Cfrown%7D%7Bx_1%7D%2C+x_2+-+%5Coverset%7B%5Cfrown%7D%7Bx_2%7D%2C+%5Cdots%2C+x_N+-+%5Coverset%7B%5Cfrown%7D%7Bx_N%7D%29%3E++%5C%5C++%26%3D+%5CVert%5Cnabla+f%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29%5CVert_2%5CVert%5Cmathbf%7Bx%7D+-+%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%5CVert_2%5Ccos%28%5Ctheta%29++%5Cend%7Baligned%7D%5Ctag%7B3%7D
当 https://www.zhihu.com/equation?tex=%5Ctheta+%3D+%5Cpi ,即 https://www.zhihu.com/equation?tex=%5Cmathbf%7Bx%7D+-+%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D+%3D+-%5Clambda%5Cnabla+f%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29%2C+%5Clambda+%3E+0 ,公式(3)取得最小值,因此梯度下降的更新公式为:

https://www.zhihu.com/equation?tex=%5Cmathbf%7Bx%7D+%3D+%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D+-+%5Clambda%5Cnabla+f%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29%2C+%5Clambda+%3E+0+%5Ctag%7B4%7D
基于中心差分求梯度的数值解

要求 https://www.zhihu.com/equation?tex=%5Cnabla+f%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29,只需求各元的偏导数。 对于显示表达式的,很容易求出偏导数的解析解。若未知表达式,可以根据偏微分的定义求偏导数的数值解。

https://www.zhihu.com/equation?tex=%5Cmathrm%7Bd%7Dy+%3D+f%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29+-+f%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D+-+%5Cmathrm%7Bd%7D%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%29+%3D+%5Csum_i+f_%7B%5Coverset%7B%5Cfrown%7D%7Bx%7D_i%7D%5Cmathrm%7Bd%7D%5Coverset%7B%5Cfrown%7D%7Bx%7D_i+%2B+o%28%5CVert%5Cmathrm%7Bd%7D%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D%5CVert_2%29+%5Ctag%7B5%7D+
只需令 https://www.zhihu.com/equation?tex=%5Cmathrm%7Bd%7Dx_i+%5Cneq+0 ,其他均为零,那么可以求出 https://www.zhihu.com/equation?tex=f_%7Bx_i%7D 。由微分中值定理得,采用中心微分求偏导数的数值解。

https://www.zhihu.com/equation?tex=f_%7B%5Coverset%7B%5Cfrown%7D%7Bx%7D_i%7D+%3D+%5Cfrac%7Bf%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D+%2B+%280%2C+%5Cdots%2C+%5Cmathrm%7Bd%7D%5Coverset%7B%5Cfrown%7D%7Bx%7D_i%2C+%5Cdots%2C+0%29%5E%5Cmathrm%7BT%7D%29+-f%28%5Coverset%7B%5Cfrown%7D%7B%5Cmathbf%7Bx%7D%7D+-+%280%2C+%5Cdots%2C+%5Cmathrm%7Bd%7D%5Coverset%7B%5Cfrown%7D%7Bx%7D_i%2C+%5Cdots%2C+0%29%5E%5Cmathrm%7BT%7D%29+%7D%7B2%5Cmathrm%7Bd%7D%5Coverset%7B%5Cfrown%7D%7Bx%7D_i%7D+%5Ctag%7B6%7D+
常令 https://www.zhihu.com/equation?tex=%5Cmathrm%7Bd%7D%5Coverset%7B%5Cfrown%7D%7Bx%7D_i+%3D+10%5E%7B-7%7D 。
最终代码

#! -*- coding:utf-8 -*-

#! -*- coding:utf-8 -*-

'''
@Author:      ZM
@Date and Time: 2021/11/7 9:08
@File:          GD.py
'''

import numpy as np
import matplotlib.pyplot as plt

'''待优化的实函数 f(x) = x0 * x0 / 4 + x1 * x1,很多情况不知函数表达式
'''
def fun(x: np.ndarray):
   return x * x / 4 + x * x

'''基于中心差分求梯度
'''
class GradFun:
    def __init__(self, f):
      self.f = f

    def __call__(self, x: np.ndarray):
      h = 1e-14
      grad = np.zeros_like(x, dtype=x.dtype)

      for idx in range(x.size):
            tmp_val = x
            x = tmp_val + h
            fxh1 = self.f(x)

            x = tmp_val - h
            fxh2 = self.f(x)

            grad = (fxh1 - fxh2) / (2 * h)
            x = tmp_val

      return grad

'''可视化
'''

def plot_fun(f):
    '''绘制函数f的曲面图
    '''
    fig = plt.figure()
    ax = plt.axes(projection='3d')
    x1 = np.linspace(-1, 1, num=50)
    x2 = np.linspace(-1, 1, num=50)
    X1, X2 = np.meshgrid(x1, x2)
    FX = f(np.concatenate(, X2], axis=0))
    ax.plot_surface(X1, X2, FX, rstride=1, cstride=1, cmap='viridis', edgecolor='none')
    fig.savefig('fun.png')
    plt.close()

def plot_training_curve(data, best_data, name=None):
    '''绘制梯度下降法搜索路径
    '''
    fig = plt.figure()
    ax = plt.axes(projection='3d')
    ax.scatter3D(*data)
    ax.scatter3D(*best_data)
    ax.plot3D(data[:, 0], data[:, 1], data[:, 2], '#FA7F6F')
    ax.set(title='Gradient descent')
    if name is not None:
      fig.savefig(f'{name}.png')
    plt.close()

grad_fun = GradFun(fun)

x = np.random.randn(2)                        # 随机初始化
alpha = 1.                                    # 初始步长

min_alpha = 1e-3                              # 最小步长
cnt = 0                                       # 控制更新步长
total_steps = 600
best_fx = fun(x)
log_data = [[*x, best_fx]]                      # 用于绘制搜索曲线
best_data = [*x, best_fx]
plot_fun(fun)

'''梯度下降法
'''
for step in range(total_steps):
    grad_x = grad_fun(x)
    x -= alpha * grad_x

    tmp_val = fun(x)
    log_data.append([*x, tmp_val])
    if best_fx > tmp_val:
      best_fx = tmp_val
      best_data = [*x, best_fx]
      cnt = 0
    else:
      cnt += 1
      if cnt > 30:    # 连续30次迭代无变化,降低步长
            alpha *= 0.1
            alpha = max(alpha, min_alpha)
            cnt = 0

log_data = np.array(log_data, dtype='float32')
print(log_data)
plot_training_curve(log_data, best_data, name='GD')效果截图



图1 待优化的函数图像

http://pic1.zhimg.com/v2-d5d27559da36a1005a2152643388b85c_r.jpg

图2 GD搜索路径


[*]对初始点敏感,不好的初始点无法收敛。
[*]搜索效率低下,成“Z"型搜索。
页: [1]
查看完整版本: 《最优化理论》作业1.1:梯度下降法