SVM算法详解

IT圈老男孩1 发表于 2021-12-2 08:37

线性模型

线性可分训练集

一个训练数据集线性可分是指：https://www.zhihu.com/equation?tex=%5C%7B%28x_i%2Cy_i%29%5C%7D_%7Bi%3D1%5Csim+N%7D%2C%5Cexists%28w%2Cb%29，使对https://www.zhihu.com/equation?tex=%5Cforall+i%3D1%5Csim+N，有

[*]若https://www.zhihu.com/equation?tex=y_i%3D%2B1，则https://www.zhihu.com/equation?tex=w%5ETx_i%2Bb%5Cge0
[*]若https://www.zhihu.com/equation?tex=y_i%3D-1，则https://www.zhihu.com/equation?tex=w%5ETx_i%2Bb%3C0
即(公式1)
对于线性可分的数据集，我们需要划一条线来分割两类不同的样本。但是分割的线有无数条，我们怎么判断哪一条线更好呢？

很多人认为第二条线是最好的。但是因为根据免费午餐定理，这三条曲线是一样好的。那么我们为什么会认为第二条曲线是最好的呢？这是因为我们在研究问题之前，对此问题存在先验假设。有很多先验假设认为第二条直线比其余两条要好。我们只考虑其中一种假设，即假设训练样本的位置在特征空间有测量误差。如下图所示：

假设红色的叉和圆圈的真实位置为红色虚线圆圈，则线3和1都会分类错误，而2不会，这说明2号线更能抵御训练样本位置的误差。
那么2号线是怎么画出来的呢？
支持向量机的创造者Vapnik是这样回答的，它首先将直线向一侧平行移动，直到它叉到一个或几个样本为止；之后再向另一侧移动，直到叉到一个或多个样本未知。

我们要找的2号线找的是使得间隔最大的且位于间隔中间的线。
在多维的情况下，直线变为超平面。
之后我们将支持向量机转化为一个优化问题，优化问题为：

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%09%26%5Cmin+%5Cfrac%7B1%7D%7B2%7D%7C%7Cw%7C%7C%5E2%5C%5C%3Cbr%3E%09%26%5Coperatorname%7Bs.t.%7D+%5Cquad+y_i%5Bw%5ET_i%2Bb%5D%5Cge1%3Cbr%3E%5Cend%7Baligned%7D
那么这是怎么得到的呢？那面我们详细讨论一下：
事实一：https://www.zhihu.com/equation?tex=w%5ETb%2Bb%3D0与https://www.zhihu.com/equation?tex=aw%5ETx%2Bab%3D0是同一个平面，https://www.zhihu.com/equation?tex=a%5Cin+R%5E%2B。即若https://www.zhihu.com/equation?tex=%28w%2Cb%29满足公式1，则https://www.zhihu.com/equation?tex=%28aw%2Cab%29也满足公式一。
公式1：事实二：点到平面的距离公式。
向量到超平面https://www.zhihu.com/equation?tex=w%5ETx%2Bb%3D0的距离：

https://www.zhihu.com/equation?tex=d+%3D+%5Cfrac%7B%7Cw%5ETx_0%2Bb%7C%7D%7B%7C%7Cw%7C%7C%7D
当为支持向量时，我们要做的就是最大化。
根据事实1，我们可以用https://www.zhihu.com/equation?tex=a去缩放：

https://www.zhihu.com/equation?tex=%28w%2Cb%29%5Crightarrow+%28aw%2Cab%29
最终使在支持向量上上，有：

此时支持向量与平面距离：

https://www.zhihu.com/equation?tex=d+%3D+%5Cfrac%7B1%7D%7B%7C%7Cw%7C%7C%7D
因为最大化https://www.zhihu.com/equation?tex=%5Cfrac%7B1%7D%7B%7C%7Cw%7C%7C%7D相当于最小化https://www.zhihu.com/equation?tex=%7C%7Cw%7C%7C%5E2，所以得到上述的目标函数。
下面看约束条件是如何得到的。
因为在上面的描述中我们有，对于所有的支持向量，我们有

所以对于非支持向量，我们有：

https://www.zhihu.com/equation?tex=%7Cw%5ETx_0%2Bb%7C%3E1
又因为：，所以综上我们有

https://www.zhihu.com/equation?tex=y_i%5Bw%5ETx_i%2Bb%5D+%3D+%7Cw%5ETx_0%2Bb%7C%5Cge+1
这样我们就得到了上面提到的优化问题。
这个优化问题为凸优化问题中的二次优化问题。
二次规划问题：

[*]目标函数是二次项
[*]限制条件是一次项
这样就会导致要么无解，要么只有一个极值。
非线性可分

我们改写目标函数和约束条件，使其变为：

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%26%5Cmin%26%5Cquad%5Cfrac%7B1%7D%7B2%7D%7C%7Cw%7C%7C%5E2%2BC%5Csum_%7Bi%3D1%7D%5EN%5Cxi_i%5C%5C%3Cbr%3E%26%5Coperatorname%7Bs.t.%7D%5Cquad+%26y_i%5Bw%5ETx_i%2Bb%5D%5Cge+1-%5Cxi_i%5C%5C%3Cbr%3E%26%5Cquad+%26%5Cxi_i%5Cge0%3Cbr%3E%5Cend%7Baligned%7D
其中https://www.zhihu.com/equation?tex=%5Cxi_i称为松弛变量，https://www.zhihu.com/equation?tex=C%5Csum_%7Bi%3D1%7D%5EN%5Cxi_i称为正则项。
非线性问题

对于下图所示的问题，我们不能找到一个很好的直线将两类分开：

但是我们可以将其映射到高维空间中的点，然后在高维空间中寻找直线。
我们定义一个从低维到高维的映射：

https://www.zhihu.com/equation?tex=x%5Cxrightarrow%7B%5Cphi%7D%5Cphi%28x%29
其中为低维向量，而为一个高维映射。
下面我们举一个例子：如下图所示的异或问题

这个问题我们在二维空间里无法找到一条直线将其分开。
在上图中我们令四个点分别为：

https://www.zhihu.com/equation?tex=x_1+%3D+%5Cbegin%7Bbmatrix%7D0%5C%5C0%5Cend%7Bbmatrix%7D%2Cx_2+%3D+%5Cbegin%7Bbmatrix%7D1%5C%5C1%5Cend%7Bbmatrix%7D%5Cin+C_1

https://www.zhihu.com/equation?tex=x_3+%3D+%5Cbegin%7Bbmatrix%7D1%5C%5C0%5Cend%7Bbmatrix%7D%2Cx_4+%3D+%5Cbegin%7Bbmatrix%7D0%5C%5C1%5Cend%7Bbmatrix%7D%5Cin+C_2
我们令

https://www.zhihu.com/equation?tex=%5Cphi%28x%29%3A+x+%3D+%5Cbegin%7Bbmatrix%7Da%5C%5Cb%5Cend%7Bbmatrix%7D%5Cxrightarrow%7B%5Cphi%7D%5Cphi%28x%29+%3D+%5Cbegin%7Bbmatrix%7Da%5E2%5C%5Cb%5E2%5C%5Ca%5C%5Cb%5C%5Cab%5Cend%7Bbmatrix%7D
则经过映射得到：

https://www.zhihu.com/equation?tex=%5Cphi%28x_1%29%3D+%5Cbegin%7Bbmatrix%7D0%5C%5C0%5C%5C0%5C%5C0%5C%5C0%5Cend%7Bbmatrix%7D%2C%5Cphi%28x_2%29%3D+%5Cbegin%7Bbmatrix%7D1%5C%5C1%5C%5C1%5C%5C1%5C%5C1%5Cend%7Bbmatrix%7D

https://www.zhihu.com/equation?tex=%5Cphi%28x_3%29%3D+%5Cbegin%7Bbmatrix%7D1%5C%5C0%5C%5C1%5C%5C0%5C%5C0%5Cend%7Bbmatrix%7D%2C%5Cphi%28x_4%29%3D+%5Cbegin%7Bbmatrix%7D0%5C%5C1%5C%5C0%5C%5C1%5C%5C0%5Cend%7Bbmatrix%7D
我们可以令

https://www.zhihu.com/equation?tex=w%3D+%5Cbegin%7Bbmatrix%7D-1%5C%5C-1%5C%5C-1%5C%5C-1%5C%5C6%5Cend%7Bbmatrix%7D%2Cb%3D1
来达到区分的目的。
有证明显示：在越高维度情况下，找打一个线性超平面来将样本分开的概率越大。我们如何选取https://www.zhihu.com/equation?tex=%5Cphi，我们将选择为无限维。但是为无限维，将为无限维，优化问题将不可做。
我们可以不知道无限维映射的显式表达，我们只要知道一个核函数：

https://www.zhihu.com/equation?tex=K%28x_1%2Cx_2%29+%3D+%5Cphi%28x_1%29%5ET%5Cphi%28x_2%29
下面的优化问题：

仍然可解。
在SVM中常用的核函数：

https://www.zhihu.com/equation?tex=K%28x_1%2Cx_2%29+%3D+e%5E%7B-%5Cfrac%7B%7C%7Cx_1-x_2%7C%7C%5E2%7D%7B2%5Csigma%5E2%7D%7D+%3D+%5Cphi%28x_1%29%5ET%5Cphi%28x_2%29
为高斯核函数

https://www.zhihu.com/equation?tex=K%28x_1%2Cx_2%29+%3D+%28x_1%5ETx_2%2B1%29%5Ed+%3D+%5Cphi%28x_1%29%5ET%5Cphi%28x_2%29
为多项式核函数，为阶数。
核函数https://www.zhihu.com/equation?tex=K必须满足某些条件才能被拆成内积的形式：
https://www.zhihu.com/equation?tex=K%28x_1%2Cx_2%29能被写成https://www.zhihu.com/equation?tex=%5Cphi%28x_1%29%5ET%5Cphi%28x_2%29的充要条件为：

[*]https://www.zhihu.com/equation?tex=K%28x_1%2Cx_2%29%3DK%28x_2%2Cx_1%29
[*]https://www.zhihu.com/equation?tex=%5Cforall+c_i%5Cin+R%2Cx_i%28i%3D1%5Csim+N%29，有$$\sum_{i=1}^N\sum_{i=1}^Nc_ic_jK(x_i,x_j)\ge 0$$
原问题和对偶问题

原问题

最小化：
限制条件：

[*]https://www.zhihu.com/equation?tex=g_i%28w%29%5Cle0%28i%3D1%5Csim+K%29
[*]https://www.zhihu.com/equation?tex=h_i%28w%29%3D0%28i%3D1%5Csim+M%29
对偶问题

定义：

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3EL%28w%2C%5Calpha%2C%5Cbeta%29+%26%3D+f%28w%29+%2B+%5Csum_%7Bi%3D1%7D%5EK%5Calpha_ig_i%28w%29%2B%5Csum_%7Bi%3D1%7D%5EM%5Cbeta_ih_i%28w%29%5C%5C%3Cbr%3E%26%3D+f%28w%29+%2B+%5Calpha%5ETg%28w%29%2B%5Cbeta%5ETh%28w%29%3Cbr%3E%5Cend%7Baligned%7D
对偶问题的定义
最大化：https://www.zhihu.com/equation?tex=%5Ctheta%28%5Calpha%2C%5Cbeta%29%3D%5Cinf_w%28w%2C%5Calpha%2C%5Cbeta%29，https://www.zhihu.com/equation?tex=%5Cinf表示下界
限制条件：https://www.zhihu.com/equation?tex=%5Calpha_i%5Cge+0%2C%5Cbeta_i%5Cge0
原问题和对偶问题解的关系

定理：如果https://www.zhihu.com/equation?tex=w%5E%5Cstar是原问题的解，而https://www.zhihu.com/equation?tex=%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar是对偶问题的解，则有：

https://www.zhihu.com/equation?tex=f%28w%5E%5Cstar%29%5Cge+%5Ctheta%28%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar%29
证明：

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%5Ctheta%28%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar%29+%26%3D+%5Cinf_w+L%28w%2C%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar%29%5C%5C%3Cbr%3E%26%5Cle+L%28w%5E%5Cstar%2C%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar%29+%3D+f%28w%5E%5Cstar%29+%2B+%5Csum_%7Bi%3D1%7D%5EK%5Calpha%5E%5Cstar_ig_i%28w%5E%5Cstar%29%2B%5Csum_%7Bi%3D1%7D%5EM%5Cbeta%5E%5Cstar_ih_i%28w%5E%5Cstar%29%5C%5C%3Cbr%3E%26%5Cle+f%28w%5E%5Cstar%29%3Cbr%3E%5Cend%7Baligned%7D
定义：

https://www.zhihu.com/equation?tex=G+%3D+f%28w%5E%5Cstar%29+-+%5Ctheta%28%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar%29%5Cge0

https://www.zhihu.com/equation?tex=G叫做原问题与对偶问题的间距。对于某些特定优化问题，可以证明：https://www.zhihu.com/equation?tex=G%3D0。
强对偶定理：若为凸函数，且https://www.zhihu.com/equation?tex=g%28w%29%3DAw%2Bb%2Ch%28w%29%3DCw%2Bd，则此优化问题的原问题与对偶问题的间距为。即

https://www.zhihu.com/equation?tex=f%28w%5E%5Cstar%29+%3D+%5Ctheta%28%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar%29
此时我们易得对https://www.zhihu.com/equation?tex=%5Cforall+i%3D1%5Csim+K：

[*]或者https://www.zhihu.com/equation?tex=%5Calpha%5E%5Cstar_i%3D0
[*]或者https://www.zhihu.com/equation?tex=g%5E%7B%5Cstar%7D_i%28w%5E%5Cstar%29%3D0
这被称为KKT条件。
利用对偶问题求解SVM

我们先复习一下原问题：

根据原问题的定义形式，我们将上述问题改写：

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%26%5Cmin%26%5Cquad%5Cfrac%7B1%7D%7B2%7D%7C%7Cw%7C%7C%5E2-C%5Csum_%7Bi%3D1%7D%5EN%5Cxi_i%5C%5C%3Cbr%3E%26%5Coperatorname%7Bs.t.%7D%26%5Cquad+1%2B%5Cxi_i-y_i+w%5ET%5Cphi%28x_i%29-y_ib%5Cle+0%5C%5C%3Cbr%3E%26%5Cquad+%26%5Cxi_i%5Cle0%3Cbr%3E%5Cend%7Baligned%7D
凸函数定义：

https://www.zhihu.com/equation?tex=f%28%5Clambda+x_1%2B%281-%5Clambda%29x_2%29%5Cle+%5Clambda+f%28x_1%29%2B%281-%5Clambda%29f%28x_2%29
我们SVM的对偶问题为：
最大化：

https://www.zhihu.com/equation?tex=%5Ctheta%28%5Calpha%2C%5Cbeta%29+%3D+%5Cinf_%7B%28w%2C%5Cxi_i%2Cb%29%7D%5C%7B%5Cfrac%7B1%7D%7B2%7D%7C%7Cw%7C%7C%5E2-C%5Csum_%7Bi%3D1%7D%5EN%5Cxi_i%2B%5Csum_%7Bi%3D1%7D%5EN%5Cbeta_i%5Cxi_i%2B%5Csum_%7Bi%3D1%7D%5EN%5Calpha_i%5B1%2B%5Cxi_i-y_i+w%5ET%5Cphi%28x_i%29-y_ib%5D%5C%7D
限制条件：

[*]https://www.zhihu.com/equation?tex=%5Calpha_i%5Cge+0
[*]https://www.zhihu.com/equation?tex=%5Cbeta_i%5Cge+0
我们要想求得https://www.zhihu.com/equation?tex=%5Ctheta%28%5Calpha%2C%5Cbeta%29，首先要最优化https://www.zhihu.com/equation?tex=w%2C%5Cxi_i%2Cb，对https://www.zhihu.com/equation?tex=L函数求偏导：

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%5Cfrac%7B%5Cpartial+L%7D%7B%5Cpartial+w%7D+%3D+0%26%5CRightarrow+w+%3D+%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_i%5Cphi%28x_i%29%5C%5C%3Cbr%3E%5Cfrac%7B%5Cpartial+L%7D%7B%5Cpartial+%5Cxi_i%7D+%3D+0%26%5CRightarrow+%5Cbeta_i%2B%5Calpha_i%3DC%5C%5C%3Cbr%3E%5Cfrac%7B%5Cpartial+L%7D%7B%5Cpartial+b%7D%3D0%26%5CRightarrow+%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_i%3D0%3Cbr%3E%5Cend%7Baligned%7D
将其代入，得到：

https://www.zhihu.com/equation?tex=%5Ctheta%28%5Calpha%2C%5Cbeta%29+%3D+%5Csum_%7Bi%3D1%7D%5EN%5Calpha_i-%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi%3D1%7D%5EN%5Csum_%7Bj%3D1%7D%5EN%5Calpha_i%5Calpha_jy_iy_jK%28x_i%2Cx_j%29
其中https://www.zhihu.com/equation?tex=K%28x_i%2Cx_j%29+%3D+%5Cphi%28x_i%29%5ET%5Cphi%28x_j%29。
所以对偶优化问题变为：
最大化：

限制条件：

[*]
[*]
这也是一个凸优化问题，解这个问题有一个标准的算法(SMO)算法。
这样我们就可以求出https://www.zhihu.com/equation?tex=%5Calpha_i%2Ci%3D1%5Csim+N，但是我们要求的是和，那么我们应该如何求出https://www.zhihu.com/equation?tex=w%2Cb呢，我们可以用之前得到的https://www.zhihu.com/equation?tex=w+%3D+%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_i%5Cphi%28x_i%29，但问题是我们并不知道https://www.zhihu.com/equation?tex=%5Cphi%28x_i%29。
但是在判断样本属于哪一类的时候我们并不需要知道，假设有测试样本，我们知道：

[*]若https://www.zhihu.com/equation?tex=w%5ET%5Cphi%28x%29%2Bb%5Cge0，则https://www.zhihu.com/equation?tex=y%3D%2B1
[*]若https://www.zhihu.com/equation?tex=w%5ET%5Cphi%28x%29%2Bb%3C0，则https://www.zhihu.com/equation?tex=y%3D-1
而

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3Ew%5ET%5Cphi%28x%29+%26%3D+%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_i%5Cphi%28x_i%29%5ET%5Cphi%28x%29%5C%5C%3Cbr%3E%26%3D+%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_iK%28x_i%2Cx%29%3Cbr%3E%5Cend%7Baligned%7D
但是应该怎么算呢？
应用KKT条件，我们有

[*]要么https://www.zhihu.com/equation?tex=%5Cbeta_i%3D0，要么https://www.zhihu.com/equation?tex=%5Cxi_i%3D0
[*]要么https://www.zhihu.com/equation?tex=%5Calpha_i%3D0，要么https://www.zhihu.com/equation?tex=1%2B%5Cxi_i-y_iw%5ET%5Cphi%28x_i%29-y_ib%3D0
我们取一个https://www.zhihu.com/equation?tex=0%3C%5Calpha_i%3CC%5CRightarrow+%5Cbeta_i%3DC-%5Calpha_i%3E0，
此时https://www.zhihu.com/equation?tex=%5Cbeta_i%5Cneq0%5CRightarrow%5Cxi_i%3D0，因为https://www.zhihu.com/equation?tex=%5Calpha_i%5Cneq0%5CRightarrow+b+%3D+y_i+-+%5Csum_%7Bj%3D1%7D%5EN%5Calpha_jy_jK%28x_i%2Cx_j%29。也可以找到所有不等于的，求得取平均。
SMO算法

最大化：

限制条件：

[*]
[*]
因为我们要优化的变量()很多，所以我们每次迭代只选择几个变量进行更新；又因为我们的是有约束的优化问题，所以每次更新可能会破坏我们的约束条件，为了不破坏我们的约束条件，我们每次至少选择两个变量进行优化。因此我们每次选择两个变量来进行优化。
两个变量二次规划的求解过程

[*]选择两个变量，其他变量固定
[*]SMO将对偶问题转化成一系列子问题

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%3Cbr%3E%5Cmin+_%7B%5Calpha_%7B1%7D%2C+%5Calpha_%7B2%7D%7D+%26+W%5Cleft%28%5Calpha_%7B1%7D%2C+%5Calpha_%7B2%7D%5Cright%29%3D%5Cfrac%7B1%7D%7B2%7D+K_%7B11%7D+%5Calpha_%7B1%7D%5E%7B2%7D%2B%5Cfrac%7B1%7D%7B2%7D+K_%7B22%7D+%5Calpha_%7B2%7D%5E%7B2%7D%2By_%7B1%7D+y_%7B2%7D+K_%7B12%7D+%5Calpha_%7B1%7D+%5Calpha_%7B2%7D+%5C%5C%3Cbr%3E%3Cbr%3E%26+-%5Cleft%28%5Calpha_%7B1%7D%2B%5Calpha_%7B2%7D%5Cright%29%2By_%7B1%7D+%5Calpha_%7B1%7D+%5Csum_%7Bi%3D3%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D+K_%7Bi+1%7D%2By_%7B2%7D+%5Calpha_%7B2%7D+%5Csum_%7Bi%3D3%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D+K_%7Bi+2%7D+%5C%5C%3Cbr%3E%3Cbr%3E%5Ctext+%7B+s.t.+%7D+%26+%5Calpha_%7B1%7D+y_%7B1%7D%2B%5Calpha_%7B2%7D+y_%7B2%7D%3D-%5Csum_%7Bi%3D3%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D%3D%5Czeta+%5C%5C%3Cbr%3E%3Cbr%3E%26+0+%5Cleq+%5Calpha_%7Bi%7D+%5Cleq+C%2C+i%3D1%2C2%3Cbr%3E%5Cend%7Baligned%7D

[*]根据约束条件，可以表示为的函数
[*]优化问题有解析解
[*]基于初始可行解https://www.zhihu.com/equation?tex=%5Calpha_1%5E%7Bold%7D%2C%5Calpha_2%5E%7Bold%7D，可以得到
两个变量，约束条件用二维空间中的图形表示：

下面首先考虑第一种情况，根据不等式条件的取值范围：

其中

https://www.zhihu.com/equation?tex=L+%3D+%5Cmax%280%2C%5Calpha_2%5E%7Bold%7D-%5Calpha_1%5E%7Bold%7D%29%5Cquad+H+%3D+%5Cmin%28C%2CC%2B%5Calpha_2%5E%7Bold%7D-%5Calpha_1%5E%7Bold%7D%29
同理对于第二种情况，根据不等式条件的取值范围：

其中

https://www.zhihu.com/equation?tex=L+%3D+%5Cmax%280%2C%5Calpha_2%5E%7Bold%7D%2B%5Calpha_1%5E%7Bold%7D-C%29%5Cquad+H+%3D+%5Cmin%28C%2C%5Calpha_2%5E%7Bold%7D%2B%5Calpha_1%5E%7Bold%7D%29
下面开始求解，求得的过程为：

[*]先求沿着约束方向未经剪辑时的https://www.zhihu.com/equation?tex=%5Calpha_2%5E%7Bnew%2Cunc%7D
[*]再求剪辑后的
我们记
https://www.zhihu.com/equation?tex=g%28x%29%3D%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_iK%28x_i%2Cx%29%2Bb，即我们的判别表达式，令：

为输入的预测值和真实输出https://www.zhihu.com/equation?tex=y的差。
为了简便，引进记号：

https://www.zhihu.com/equation?tex=v_i+%3D+%5Csum_%7Bj%3D3%7D%5EN%5Calpha_jy_jK%28x_i%2Cx_j%29+%3D+g%28x_i%29+-+%5Csum_%7Bj%3D1%7D%5E2%5Calpha_jy_jK%28x_i%2Cx_j%29-b
目标函数写成：

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%3Cbr%3EW%5Cleft%28%5Calpha_%7B1%7D%2C+%5Calpha_%7B2%7D%5Cright%29%3D%26+%5Cfrac%7B1%7D%7B2%7D+K_%7B11%7D+%5Calpha_%7B1%7D%5E%7B2%7D%2B%5Cfrac%7B1%7D%7B2%7D+K_%7B22%7D+%5Calpha_%7B2%7D%5E%7B2%7D%2By_%7B1%7D+y_%7B2%7D+K_%7B12%7D+%5Calpha_%7B1%7D+%5Calpha_%7B2%7D+%5C%5C%3Cbr%3E%3Cbr%3E%26-%5Cleft%28%5Calpha_%7B1%7D%2B%5Calpha_%7B2%7D%5Cright%29%2By_%7B1%7D+v_%7B1%7D+%5Calpha_%7B1%7D%2By_%7B2%7D+v_%7B2%7D+%5Calpha_%7B2%7D%3Cbr%3E%3Cbr%3E%5Cend%7Baligned%7D
由https://www.zhihu.com/equation?tex=%5Calpha_1y_1+%3D+%5Czeta-%5Calpha_2y_2及https://www.zhihu.com/equation?tex=y_i%5E2%3D1，我们得https://www.zhihu.com/equation?tex=%5Calpha_1+%3D+%28%5Czeta-y_2%5Calpha_2%29y_1，代入上式得到只是的函数的目标函数：

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3EW%28%5Calpha_2%29+%26%3D+%5Cfrac%7B1%7D%7B2%7DK_%7B11%7D%28%5Czeta-%5Calpha_2y_2%29%5E2+%2B+%5Cfrac%7B1%7D%7B2%7DK_%7B22%7D%5Calpha_2%5E2%2By_2K_%7B12%7D%28%5Czeta-%5Calpha_2y_2%29%5Calpha_2%5C%5C%3Cbr%3E%26-%28%5Czeta-%5Calpha_2y_2%29y_1-%5Calpha_2%2Bv_1%28%5Czeta-%5Calpha_2y_2%29%2By_2v_2%5Calpha_2%3Cbr%3E%5Cend%7Baligned%7D
对求导并令其等于，得：

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%3Cbr%3E%26%5Cleft%28K_%7B11%7D%2BK_%7B22%7D-2+K_%7B12%7D%5Cright%29+%5Calpha_%7B2%7D%3Dy_%7B2%7D%5Cleft%28y_%7B2%7D-y_%7B1%7D%2B%5Czeta+K_%7B11%7D-%5Czeta+K_%7B12%7D%2Bv_%7B1%7D-v_%7B2%7D%5Cright%29+%5C%5C%3Cbr%3E%3Cbr%3E%26%3Dy_%7B2%7D%5Cleft%5By_%7B2%7D-y_%7B1%7D%2B%5Czeta+K_%7B11%7D-%5Czeta+K_%7B12%7D%2B%5Cleft%28g%5Cleft%28x_%7B1%7D%5Cright%29-%5Csum_%7Bj%3D1%7D%5E%7B2%7D+y_%7Bj%7D+%5Calpha_%7Bj%7D+K_%7B1+j%7D-b%5Cright%29-%5Cleft%28g%5Cleft%28x_%7B2%7D%5Cright%29-%5Csum_%7Bj%3D1%7D%5E%7B2%7D+y_%7Bj%7D+%5Calpha_%7Bj%7D+K_%7B2+j%7D-b%5Cright%29%5Cright%5D%3Cbr%3E%3Cbr%3E%5Cend%7Baligned%7D
将https://www.zhihu.com/equation?tex=%5Czeta%3D%5Calpha_1%5E%7Bold%7Dy_1%2B%5Calpha_2%5E%7Bold%7Dy_2代入：

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%3Cbr%3E%5Cleft%28K_%7B11%7D%2BK_%7B22%7D-2+K+12%5Cright%29+%5Calpha_%7B2%7D%5E%7Bn+e+w%2C+u+n+c%7D+%26%5Cleft.%3Dy_%7B2%7D%5Cleft%28%5Cleft%28K_%7B11%7D%2BK_%7B22%7D-2+K_%7B12%7D%5Cright%29+%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D+y_%7B2%7D%2By_%7B2%7D-y_%7B1%7D%2Bg%5Cleft%28x_%7B1%7D%5Cright%29-g%5Cleft%28x_%7B2%7D%5Cright%29%5Cright%29%5Cright%29+%5C%5C%3Cbr%3E%3Cbr%3E%26%3D%5Cleft%28K_%7B11%7D%2BK_%7B22%7D-2+K_%7B12%7D%5Cright%29+%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%2By_%7B2%7D%5Cleft%28E_%7B1%7D-E_%7B2%7D%5Cright%29%3Cbr%3E%3Cbr%3E%5Cend%7Baligned%7D
将https://www.zhihu.com/equation?tex=%5Ceta+%3D+K_%7B11%7D%2BK_%7B22%7D-2K_%7B12%7D代入：

https://www.zhihu.com/equation?tex=%5Calpha_2%5E%7Bnew%2Cunc%7D+%3D+%5Calpha_2%5E%7Bold%7D%2B%5Cfrac%7By_2%28E_1-E_2%29%7D%7B%5Ceta%7D
我们之后对解进行剪辑：

https://www.zhihu.com/equation?tex=%5Cbegin%7Bcases%7D%3Cbr%3EH%2C%5Cquad+%26%5Calpha_2%5E%7Bnew%2Cunc%7D%3EH%5C%5C%3Cbr%3E%5Calpha_2%5E%7Bnew%2Cunc%7D%2C%5Cquad%26L%5Cle+%5Calpha_2%5E%7Bnew%2Cunc%7D%5Cle+H%5C%5C%3Cbr%3EL%2C%5Cquad%26+%5Calpha_2%5E%7Bnew%2Cunc%7D%3CL%3Cbr%3E%5Cend%7Bcases%7D
得到的解：

https://www.zhihu.com/equation?tex=%5Calpha_1%5E%7Bnew%7D+%3D+%5Calpha_1%5E%7Bold%7D%2By_1y_2%28%5Calpha_2%5E%7Bold%7D-%5Calpha_2%5E%7Bnew%7D%29
关于KKT条件，我们有：

https://www.zhihu.com/equation?tex=%5Cbegin%7Barray%7D%7Br%7D%3Cbr%3E%3Cbr%3E%5Calpha_%7Bi%7D%3D0+%5CLeftrightarrow+y_%7Bi%7D+g%5Cleft%28x_%7Bi%7D%5Cright%29+%5Cgeqslant+1+%5C%5C%3Cbr%3E%3Cbr%3E0%3C%5Calpha_%7Bi%7D%3CC+%5CLeftrightarrow+y_%7Bi%7D+g%5Cleft%28x_%7Bi%7D%5Cright%29%3D1+%5C%5C%3Cbr%3E%3Cbr%3E%5Calpha_%7Bi%7D%3DC+%5CLeftrightarrow+y_%7Bi%7D+g%5Cleft%28x_%7Bi%7D%5Cright%29+%5Cleqslant+1%3Cbr%3E%3Cbr%3E%5Cend%7Barray%7D
下面我们计算阈值和
由KKT条件，如果https://www.zhihu.com/equation?tex=0%3C%5Calpha_1%5E%7Bnew%7D%3CC，则

https://www.zhihu.com/equation?tex=%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_iK_%7Bi1%7D%2Bb%3Dy_1

https://www.zhihu.com/equation?tex=b_1%5E%7Bnew%7D+%3D+y_1-%5Csum_%7Bi%3D3%7D%5EN%5Calpha_iy_iK_%7Bi1%7D-%5Calpha_1%5E%7Bnew%7Dy_1K_%7B11%7D-%5Calpha_2%5E%7Bnew%7Dy_2K_%7B21%7D

https://www.zhihu.com/equation?tex=E_1+%3D+%5Csum_%7Bi%3D3%7D%5EN%5Calpha_iy_iK_%7Bi1%7D%2B%5Calpha_1%5E%7Bold%7Dy_1K_%7B11%7D%2B%5Calpha_2%5E%7Bold%7Dy_2K_%7B21%7D%2Bb%5E%7Bold%7D-y_1

https://www.zhihu.com/equation?tex=E_1的表达式与https://www.zhihu.com/equation?tex=b_1%5E%7Bnew%7D相结合，得：

https://www.zhihu.com/equation?tex=b_1%5E%7Bnew%7D+%3D+-E_1-y_1K_%7B11%7D%28%5Calpha_1%5E%7Bnew%7D-%5Calpha_1%5E%7Bold%7D%29-y_2K_%7B21%7D%28%5Calpha_2%5E%7Bnew%7D-%5Calpha_2%5E%7Bold%7D%29%2Bb%5E%7Bold%7D
同理，如果https://www.zhihu.com/equation?tex=0%3C%5Calpha_2%5E%7Bnew%7D%3CC，则

https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%3Cbr%3E%260%3C%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3CC+%5C%5C%3Cbr%3E%3Cbr%3E%26b_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3D-E_%7B2%7D-y_%7B1%7D+K_%7B12%7D%5Cleft%28%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bnew+%7D%7D-%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29-y_%7B2%7D+K_%7B22%7D%5Cleft%28%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D-%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29%2Bb%5E%7B%5Ctext+%7Bold+%7D%7D+%5C%5C%3Cbr%3E%3Cbr%3E%26E_%7Bi%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3D%5Csum_%7BS%7D+y_%7Bj%7D+%5Calpha_%7Bj%7D+K%5Cleft%28x_%7Bi%7D%2C+x_%7Bj%7D%5Cright%29%2Bb%5E%7B%5Ctext+%7Bnew+%7D%7D-y_%7Bi%7D%3Cbr%3E%3Cbr%3E%5Cend%7Baligned%7D
如果同时满足条件https://www.zhihu.com/equation?tex=0%3C%5Calpha_i%5E%7Bnew%7D%3CC，那么https://www.zhihu.com/equation?tex=b_1%5E%7Bnew%7D%3Db_2%5E%7Bnew%7D。如果是或者https://www.zhihu.com/equation?tex=C，那么https://www.zhihu.com/equation?tex=b_1%5E%7Bnew%7D%2Cb_2%5E%7Bnew%7D以及它们之间的数都是符合KKT条件的阈值，这时选择它们的中点作为。
在每次完成两个变量的优化之后，还必须更新对应的值，并将它们保存在列表中。值的更新要用到值，以及所有支持向量对应的https://www.zhihu.com/equation?tex=%5Calpha_j：

https://www.zhihu.com/equation?tex=E_i%5E%7Bnew%7D+%3D+%5Csum_%7BS%7Dy_ja_jK%28x_i%2Cx_j%29%2Bb%5E%7Bnew%7D-y_i
其中，https://www.zhihu.com/equation?tex=S是所有支持向量的集合。
关于此方面的解释
变量的启发式选择

SMO算法在每个子问题中选择两个变量优化，其中至少一个变量是违反KKT条件的。

[*]第一个变量的选择：外循环
[*]违反KKT条件最严重的样本点
[*]检验样本点是否满足KKT条件：
[*]https://www.zhihu.com/equation?tex=%5Cbegin%7Barray%7D%7Br%7D+%5Calpha_%7Bi%7D%3D0+%5CLeftrightarrow+y_%7Bi%7D+g%5Cleft%28x_%7Bi%7D%5Cright%29+%5Cgeqslant+1+%5C%5C+0%3C%5Calpha_%7Bi%7D%3CC%5CLeftrightarrow+y_%7Bi%7D+g%5Cleft%28x_%7Bi%7D%5Cright%29%3D1+%5C%5C+%5Calpha_%7Bi%7D%3DC+%5CLeftrightarrow+y_%7Bi%7D+g%5Cleft%28x_%7Bi%7D%5Cright%29+%5Cleqslant+1%5Cend%7Barray%7D\end{array}$$

该检验是在https://www.zhihu.com/equation?tex=%5Cepsilon范围内进行的。在检验过程中，外层循环首先遍历满足条件https://www.zhihu.com/equation?tex=0%3C%5Calpha_i%3CC的样本点，即在间隔边界上的支持向量点，检验它们是否满足KKT条件，如果这些样本点都满足KKT条件，那么遍历整个训练集，检验他们是否满足KKT条件。

[*]第二个变量的检查：内循环
[*]选择的标准是希望能使目标函数有足够大的变化
[*]即对应https://www.zhihu.com/equation?tex=%7CE_1-E_2%7C最大

[*]如果内循环通过上述方法找到的点不能使目标函数有足够大的下降，则：遍历间隔边界上的样本点，测试目标函数下降
[*]如果下降不大，则遍历所有样本点
[*]如果依然下降不大，则丢弃外循环点，重新选择

算法实现
def kernal(a, b):
# 高斯核, s为方差
return a.dot(b)
class SVM:
def __init__(self,data,label):
   self.data = data
   self.label = label
   self.len = data.shape
   self.E = np.zeros((self.len,1))
   self.alpha = np.random.random((self.len,1))
   self.b = 0
def f(self,i):
   # 传入索引
   s = 0
   for k in range(self.len):
         s += self.alpha*self.label*kernal(self.data,self.data)
   return s + self.b
def error(self):
   # 计算E
   for j in range(self.len):
         E = 0
         for i in range(self.len):
            E += self.alpha*self.label*kernal(self.data,self.data)
         self.E = E + self.b - self.label

def bound(self, i,j,alpha_i, alpha_j, C):
   # 传入两个索引，为需要优化的alpha的索引
   # 求解alpha_j的范围
   # C为正则化项系数
   if self.label != self.label:
         L, H = np.max(), np.min()
   else:
         L, H = np.max(), np.min()
   return (L,H)

def update(self, i,j, C):
   # 更新alpha和b
   # 传入索引
   eta = kernal(self.data,self.data) + kernal(self.data,self.data) - 2*kernal(self.data,self.data)

   # 下面需要补充逻辑关系
   alpha_old_i = self.alpha
   alpha_old_j = self.alpha
   self.alpha = self.alpha + self.label*(self.E-self.E)/eta
   L, H = self.bound(i,j,alpha_old_i,alpha_old_j,C)
   if self.alpha >= H:
         self.alpha = H
   elif self.alpha <= L:
         self.alpha = L
   else:
         self.alpha = self.alpha
   self.alpha = self.alpha + self.label*self.label*(alpha_old_j-self.alpha)

   b1 = self.b - self.E - self.label*(self.alpha-alpha_old_i)*kernal(self.data,self.data)-self.label*(self.alpha-alpha_old_j)*kernal(self.data,self.data)
   b2 = self.b - self.E - self.label*(self.alpha-alpha_old_i)*kernal(self.data,self.data)-self.label*(self.alpha-alpha_old_j)*kernal(self.data,self.data)

   self.b = (b1+b2)/2
   self.error()
def smo(self, epsilon, max_iter, C):
   for k in range(max_iter):
         I = np.intersect1d(np.argwhere(self.alpha>0),np.argwhere(self.alpha<C))
         d = []
         for i in I:
            d.append(np.abs(self.label*self.f(i)-1))
         i = np.argmax(np.array(d))
         j = np.argmax(self.E-self.E)
         self.update(i,j,C)

         ge = np.array(>=1])
         eq = np.array(==1])
         le = np.array(<=1])
         Ge = []
         Eq = []
         Le = []
         for ig in ge:
            Ge.appned(self.label*self.f(ig))
         for ie in eq:
            Eq.append(self.label*self.f(ie))
         for il in le:
            Le.append(self.label*self.f(il))

         Ge = np.array(Ge)
         Eq = np.array(Eq)
         Le = np.array(Le)

         ne = np.sum(np.abs(Eq-1)>epsilon)
         ng1 = np.sum(np.abs(Ge-1)<0)
         ng2 = np.sum(np.abs(Ge-1)>epsilon)
         nl1 = np.sum(np.abs(Le-1)>0)
         nl2 = np.sum(np.abs(Le-1)>epsilon)
         if (ne+ng1+ng2+nl1+nl2)==0:
            break
def predict(self, x):
   s = 0
   for k in range(self.len):
         s += self.alpha*self.label*kernal(self.data,x)
   s = s + self.b
   if s >=0:
         return 1
   else:
         return -1
本文使用 Zhihu On VSCode 创作并发布

页: [1]

Unity开发者联盟's Archiver

SVM算法详解