SVM算法详解
线性模型线性可分训练集
一个训练数据集线性可分是指:https://www.zhihu.com/equation?tex=%5C%7B%28x_i%2Cy_i%29%5C%7D_%7Bi%3D1%5Csim+N%7D%2C%5Cexists%28w%2Cb%29,使对https://www.zhihu.com/equation?tex=%5Cforall+i%3D1%5Csim+N,有
[*]若https://www.zhihu.com/equation?tex=y_i%3D%2B1,则https://www.zhihu.com/equation?tex=w%5ETx_i%2Bb%5Cge0
[*]若https://www.zhihu.com/equation?tex=y_i%3D-1,则https://www.zhihu.com/equation?tex=w%5ETx_i%2Bb%3C0
即(公式1)
对于线性可分的数据集,我们需要划一条线来分割两类不同的样本。但是分割的线有无数条,我们怎么判断哪一条线更好呢?
很多人认为第二条线是最好的。但是因为根据免费午餐定理,这三条曲线是一样好的。那么我们为什么会认为第二条曲线是最好的呢?这是因为我们在研究问题之前,对此问题存在先验假设。有很多先验假设认为第二条直线比其余两条要好。我们只考虑其中一种假设,即假设训练样本的位置在特征空间有测量误差。如下图所示:
假设红色的叉和圆圈的真实位置为红色虚线圆圈,则线3和1都会分类错误,而2不会,这说明2号线更能抵御训练样本位置的误差。
那么2号线是怎么画出来的呢?
支持向量机的创造者Vapnik是这样回答的,它首先将直线向一侧平行移动,直到它叉到一个或几个样本为止;之后再向另一侧移动,直到叉到一个或多个样本未知。
我们要找的2号线找的是使得间隔最大的且位于间隔中间的线。
在多维的情况下,直线变为超平面。
之后我们将支持向量机转化为一个优化问题,优化问题为:
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%09%26%5Cmin+%5Cfrac%7B1%7D%7B2%7D%7C%7Cw%7C%7C%5E2%5C%5C%3Cbr%3E%09%26%5Coperatorname%7Bs.t.%7D+%5Cquad+y_i%5Bw%5ET_i%2Bb%5D%5Cge1%3Cbr%3E%5Cend%7Baligned%7D
那么这是怎么得到的呢?那面我们详细讨论一下:
事实一:https://www.zhihu.com/equation?tex=w%5ETb%2Bb%3D0与https://www.zhihu.com/equation?tex=aw%5ETx%2Bab%3D0是同一个平面,https://www.zhihu.com/equation?tex=a%5Cin+R%5E%2B。即若https://www.zhihu.com/equation?tex=%28w%2Cb%29满足公式1,则https://www.zhihu.com/equation?tex=%28aw%2Cab%29也满足公式一。
公式1:事实二:点到平面的距离公式。
向量到超平面https://www.zhihu.com/equation?tex=w%5ETx%2Bb%3D0的距离:
https://www.zhihu.com/equation?tex=d+%3D+%5Cfrac%7B%7Cw%5ETx_0%2Bb%7C%7D%7B%7C%7Cw%7C%7C%7D
当为支持向量时,我们要做的就是最大化。
根据事实1,我们可以用https://www.zhihu.com/equation?tex=a去缩放:
https://www.zhihu.com/equation?tex=%28w%2Cb%29%5Crightarrow+%28aw%2Cab%29
最终使在支持向量上上,有:
此时支持向量与平面距离:
https://www.zhihu.com/equation?tex=d+%3D+%5Cfrac%7B1%7D%7B%7C%7Cw%7C%7C%7D
因为最大化https://www.zhihu.com/equation?tex=%5Cfrac%7B1%7D%7B%7C%7Cw%7C%7C%7D相当于最小化https://www.zhihu.com/equation?tex=%7C%7Cw%7C%7C%5E2,所以得到上述的目标函数。
下面看约束条件是如何得到的。
因为在上面的描述中我们有,对于所有的支持向量,我们有
所以对于非支持向量,我们有:
https://www.zhihu.com/equation?tex=%7Cw%5ETx_0%2Bb%7C%3E1
又因为:,所以综上我们有
https://www.zhihu.com/equation?tex=y_i%5Bw%5ETx_i%2Bb%5D+%3D+%7Cw%5ETx_0%2Bb%7C%5Cge+1
这样我们就得到了上面提到的优化问题。
这个优化问题为凸优化问题中的二次优化问题。
二次规划问题:
[*]目标函数是二次项
[*]限制条件是一次项
这样就会导致要么无解,要么只有一个极值。
非线性可分
我们改写目标函数和约束条件,使其变为:
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%26%5Cmin%26%5Cquad%5Cfrac%7B1%7D%7B2%7D%7C%7Cw%7C%7C%5E2%2BC%5Csum_%7Bi%3D1%7D%5EN%5Cxi_i%5C%5C%3Cbr%3E%26%5Coperatorname%7Bs.t.%7D%5Cquad+%26y_i%5Bw%5ETx_i%2Bb%5D%5Cge+1-%5Cxi_i%5C%5C%3Cbr%3E%26%5Cquad+%26%5Cxi_i%5Cge0%3Cbr%3E%5Cend%7Baligned%7D
其中https://www.zhihu.com/equation?tex=%5Cxi_i称为松弛变量,https://www.zhihu.com/equation?tex=C%5Csum_%7Bi%3D1%7D%5EN%5Cxi_i称为正则项。
非线性问题
对于下图所示的问题,我们不能找到一个很好的直线将两类分开:
但是我们可以将其映射到高维空间中的点,然后在高维空间中寻找直线。
我们定义一个从低维到高维的映射:
https://www.zhihu.com/equation?tex=x%5Cxrightarrow%7B%5Cphi%7D%5Cphi%28x%29
其中为低维向量,而为一个高维映射。
下面我们举一个例子:如下图所示的异或问题
这个问题我们在二维空间里无法找到一条直线将其分开。
在上图中我们令四个点分别为:
https://www.zhihu.com/equation?tex=x_1+%3D+%5Cbegin%7Bbmatrix%7D0%5C%5C0%5Cend%7Bbmatrix%7D%2Cx_2+%3D+%5Cbegin%7Bbmatrix%7D1%5C%5C1%5Cend%7Bbmatrix%7D%5Cin+C_1
https://www.zhihu.com/equation?tex=x_3+%3D+%5Cbegin%7Bbmatrix%7D1%5C%5C0%5Cend%7Bbmatrix%7D%2Cx_4+%3D+%5Cbegin%7Bbmatrix%7D0%5C%5C1%5Cend%7Bbmatrix%7D%5Cin+C_2
我们令
https://www.zhihu.com/equation?tex=%5Cphi%28x%29%3A+x+%3D+%5Cbegin%7Bbmatrix%7Da%5C%5Cb%5Cend%7Bbmatrix%7D%5Cxrightarrow%7B%5Cphi%7D%5Cphi%28x%29+%3D+%5Cbegin%7Bbmatrix%7Da%5E2%5C%5Cb%5E2%5C%5Ca%5C%5Cb%5C%5Cab%5Cend%7Bbmatrix%7D
则经过映射得到:
https://www.zhihu.com/equation?tex=%5Cphi%28x_1%29%3D+%5Cbegin%7Bbmatrix%7D0%5C%5C0%5C%5C0%5C%5C0%5C%5C0%5Cend%7Bbmatrix%7D%2C%5Cphi%28x_2%29%3D+%5Cbegin%7Bbmatrix%7D1%5C%5C1%5C%5C1%5C%5C1%5C%5C1%5Cend%7Bbmatrix%7D
https://www.zhihu.com/equation?tex=%5Cphi%28x_3%29%3D+%5Cbegin%7Bbmatrix%7D1%5C%5C0%5C%5C1%5C%5C0%5C%5C0%5Cend%7Bbmatrix%7D%2C%5Cphi%28x_4%29%3D+%5Cbegin%7Bbmatrix%7D0%5C%5C1%5C%5C0%5C%5C1%5C%5C0%5Cend%7Bbmatrix%7D
我们可以令
https://www.zhihu.com/equation?tex=w%3D+%5Cbegin%7Bbmatrix%7D-1%5C%5C-1%5C%5C-1%5C%5C-1%5C%5C6%5Cend%7Bbmatrix%7D%2Cb%3D1
来达到区分的目的。
有证明显示:在越高维度情况下,找打一个线性超平面来将样本分开的概率越大。我们如何选取https://www.zhihu.com/equation?tex=%5Cphi,我们将选择为无限维。但是为无限维,将为无限维,优化问题将不可做。
我们可以不知道无限维映射的显式表达,我们只要知道一个核函数:
https://www.zhihu.com/equation?tex=K%28x_1%2Cx_2%29+%3D+%5Cphi%28x_1%29%5ET%5Cphi%28x_2%29
下面的优化问题:
仍然可解。
在SVM中常用的核函数:
https://www.zhihu.com/equation?tex=K%28x_1%2Cx_2%29+%3D+e%5E%7B-%5Cfrac%7B%7C%7Cx_1-x_2%7C%7C%5E2%7D%7B2%5Csigma%5E2%7D%7D+%3D+%5Cphi%28x_1%29%5ET%5Cphi%28x_2%29
为高斯核函数
https://www.zhihu.com/equation?tex=K%28x_1%2Cx_2%29+%3D+%28x_1%5ETx_2%2B1%29%5Ed+%3D+%5Cphi%28x_1%29%5ET%5Cphi%28x_2%29
为多项式核函数,为阶数。
核函数https://www.zhihu.com/equation?tex=K必须满足某些条件才能被拆成内积的形式:
https://www.zhihu.com/equation?tex=K%28x_1%2Cx_2%29能被写成https://www.zhihu.com/equation?tex=%5Cphi%28x_1%29%5ET%5Cphi%28x_2%29的充要条件为:
[*]https://www.zhihu.com/equation?tex=K%28x_1%2Cx_2%29%3DK%28x_2%2Cx_1%29
[*]https://www.zhihu.com/equation?tex=%5Cforall+c_i%5Cin+R%2Cx_i%28i%3D1%5Csim+N%29,有$$\sum_{i=1}^N\sum_{i=1}^Nc_ic_jK(x_i,x_j)\ge 0$$
原问题和对偶问题
原问题
最小化:
限制条件:
[*]https://www.zhihu.com/equation?tex=g_i%28w%29%5Cle0%28i%3D1%5Csim+K%29
[*]https://www.zhihu.com/equation?tex=h_i%28w%29%3D0%28i%3D1%5Csim+M%29
对偶问题
定义:
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3EL%28w%2C%5Calpha%2C%5Cbeta%29+%26%3D+f%28w%29+%2B+%5Csum_%7Bi%3D1%7D%5EK%5Calpha_ig_i%28w%29%2B%5Csum_%7Bi%3D1%7D%5EM%5Cbeta_ih_i%28w%29%5C%5C%3Cbr%3E%26%3D+f%28w%29+%2B+%5Calpha%5ETg%28w%29%2B%5Cbeta%5ETh%28w%29%3Cbr%3E%5Cend%7Baligned%7D
对偶问题的定义
最大化:https://www.zhihu.com/equation?tex=%5Ctheta%28%5Calpha%2C%5Cbeta%29%3D%5Cinf_w%28w%2C%5Calpha%2C%5Cbeta%29,https://www.zhihu.com/equation?tex=%5Cinf表示下界
限制条件:https://www.zhihu.com/equation?tex=%5Calpha_i%5Cge+0%2C%5Cbeta_i%5Cge0
原问题和对偶问题解的关系
定理:如果https://www.zhihu.com/equation?tex=w%5E%5Cstar是原问题的解,而https://www.zhihu.com/equation?tex=%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar是对偶问题的解,则有:
https://www.zhihu.com/equation?tex=f%28w%5E%5Cstar%29%5Cge+%5Ctheta%28%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar%29
证明:
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%5Ctheta%28%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar%29+%26%3D+%5Cinf_w+L%28w%2C%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar%29%5C%5C%3Cbr%3E%26%5Cle+L%28w%5E%5Cstar%2C%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar%29+%3D+f%28w%5E%5Cstar%29+%2B+%5Csum_%7Bi%3D1%7D%5EK%5Calpha%5E%5Cstar_ig_i%28w%5E%5Cstar%29%2B%5Csum_%7Bi%3D1%7D%5EM%5Cbeta%5E%5Cstar_ih_i%28w%5E%5Cstar%29%5C%5C%3Cbr%3E%26%5Cle+f%28w%5E%5Cstar%29%3Cbr%3E%5Cend%7Baligned%7D
定义:
https://www.zhihu.com/equation?tex=G+%3D+f%28w%5E%5Cstar%29+-+%5Ctheta%28%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar%29%5Cge0
https://www.zhihu.com/equation?tex=G叫做原问题与对偶问题的间距。对于某些特定优化问题,可以证明:https://www.zhihu.com/equation?tex=G%3D0。
强对偶定理:若为凸函数,且https://www.zhihu.com/equation?tex=g%28w%29%3DAw%2Bb%2Ch%28w%29%3DCw%2Bd,则此优化问题的原问题与对偶问题的间距为。即
https://www.zhihu.com/equation?tex=f%28w%5E%5Cstar%29+%3D+%5Ctheta%28%5Calpha%5E%5Cstar%2C%5Cbeta%5E%5Cstar%29
此时我们易得对https://www.zhihu.com/equation?tex=%5Cforall+i%3D1%5Csim+K:
[*]或者https://www.zhihu.com/equation?tex=%5Calpha%5E%5Cstar_i%3D0
[*]或者https://www.zhihu.com/equation?tex=g%5E%7B%5Cstar%7D_i%28w%5E%5Cstar%29%3D0
这被称为KKT条件。
利用对偶问题求解SVM
我们先复习一下原问题:
根据原问题的定义形式,我们将上述问题改写:
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%26%5Cmin%26%5Cquad%5Cfrac%7B1%7D%7B2%7D%7C%7Cw%7C%7C%5E2-C%5Csum_%7Bi%3D1%7D%5EN%5Cxi_i%5C%5C%3Cbr%3E%26%5Coperatorname%7Bs.t.%7D%26%5Cquad+1%2B%5Cxi_i-y_i+w%5ET%5Cphi%28x_i%29-y_ib%5Cle+0%5C%5C%3Cbr%3E%26%5Cquad+%26%5Cxi_i%5Cle0%3Cbr%3E%5Cend%7Baligned%7D
凸函数定义:
https://www.zhihu.com/equation?tex=f%28%5Clambda+x_1%2B%281-%5Clambda%29x_2%29%5Cle+%5Clambda+f%28x_1%29%2B%281-%5Clambda%29f%28x_2%29
我们SVM的对偶问题为:
最大化:
https://www.zhihu.com/equation?tex=%5Ctheta%28%5Calpha%2C%5Cbeta%29+%3D+%5Cinf_%7B%28w%2C%5Cxi_i%2Cb%29%7D%5C%7B%5Cfrac%7B1%7D%7B2%7D%7C%7Cw%7C%7C%5E2-C%5Csum_%7Bi%3D1%7D%5EN%5Cxi_i%2B%5Csum_%7Bi%3D1%7D%5EN%5Cbeta_i%5Cxi_i%2B%5Csum_%7Bi%3D1%7D%5EN%5Calpha_i%5B1%2B%5Cxi_i-y_i+w%5ET%5Cphi%28x_i%29-y_ib%5D%5C%7D
限制条件:
[*]https://www.zhihu.com/equation?tex=%5Calpha_i%5Cge+0
[*]https://www.zhihu.com/equation?tex=%5Cbeta_i%5Cge+0
我们要想求得https://www.zhihu.com/equation?tex=%5Ctheta%28%5Calpha%2C%5Cbeta%29,首先要最优化https://www.zhihu.com/equation?tex=w%2C%5Cxi_i%2Cb,对https://www.zhihu.com/equation?tex=L函数求偏导:
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%5Cfrac%7B%5Cpartial+L%7D%7B%5Cpartial+w%7D+%3D+0%26%5CRightarrow+w+%3D+%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_i%5Cphi%28x_i%29%5C%5C%3Cbr%3E%5Cfrac%7B%5Cpartial+L%7D%7B%5Cpartial+%5Cxi_i%7D+%3D+0%26%5CRightarrow+%5Cbeta_i%2B%5Calpha_i%3DC%5C%5C%3Cbr%3E%5Cfrac%7B%5Cpartial+L%7D%7B%5Cpartial+b%7D%3D0%26%5CRightarrow+%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_i%3D0%3Cbr%3E%5Cend%7Baligned%7D
将其代入,得到:
https://www.zhihu.com/equation?tex=%5Ctheta%28%5Calpha%2C%5Cbeta%29+%3D+%5Csum_%7Bi%3D1%7D%5EN%5Calpha_i-%5Cfrac%7B1%7D%7B2%7D%5Csum_%7Bi%3D1%7D%5EN%5Csum_%7Bj%3D1%7D%5EN%5Calpha_i%5Calpha_jy_iy_jK%28x_i%2Cx_j%29
其中https://www.zhihu.com/equation?tex=K%28x_i%2Cx_j%29+%3D+%5Cphi%28x_i%29%5ET%5Cphi%28x_j%29。
所以对偶优化问题变为:
最大化:
限制条件:
[*]
[*]
这也是一个凸优化问题,解这个问题有一个标准的算法(SMO)算法。
这样我们就可以求出https://www.zhihu.com/equation?tex=%5Calpha_i%2Ci%3D1%5Csim+N,但是我们要求的是和,那么我们应该如何求出https://www.zhihu.com/equation?tex=w%2Cb呢,我们可以用之前得到的https://www.zhihu.com/equation?tex=w+%3D+%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_i%5Cphi%28x_i%29,但问题是我们并不知道https://www.zhihu.com/equation?tex=%5Cphi%28x_i%29。
但是在判断样本属于哪一类的时候我们并不需要知道,假设有测试样本,我们知道:
[*]若https://www.zhihu.com/equation?tex=w%5ET%5Cphi%28x%29%2Bb%5Cge0,则https://www.zhihu.com/equation?tex=y%3D%2B1
[*]若https://www.zhihu.com/equation?tex=w%5ET%5Cphi%28x%29%2Bb%3C0,则https://www.zhihu.com/equation?tex=y%3D-1
而
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3Ew%5ET%5Cphi%28x%29+%26%3D+%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_i%5Cphi%28x_i%29%5ET%5Cphi%28x%29%5C%5C%3Cbr%3E%26%3D+%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_iK%28x_i%2Cx%29%3Cbr%3E%5Cend%7Baligned%7D
但是应该怎么算呢?
应用KKT条件,我们有
[*]要么https://www.zhihu.com/equation?tex=%5Cbeta_i%3D0,要么https://www.zhihu.com/equation?tex=%5Cxi_i%3D0
[*]要么https://www.zhihu.com/equation?tex=%5Calpha_i%3D0,要么https://www.zhihu.com/equation?tex=1%2B%5Cxi_i-y_iw%5ET%5Cphi%28x_i%29-y_ib%3D0
我们取一个https://www.zhihu.com/equation?tex=0%3C%5Calpha_i%3CC%5CRightarrow+%5Cbeta_i%3DC-%5Calpha_i%3E0,
此时https://www.zhihu.com/equation?tex=%5Cbeta_i%5Cneq0%5CRightarrow%5Cxi_i%3D0,因为https://www.zhihu.com/equation?tex=%5Calpha_i%5Cneq0%5CRightarrow+b+%3D+y_i+-+%5Csum_%7Bj%3D1%7D%5EN%5Calpha_jy_jK%28x_i%2Cx_j%29。也可以找到所有不等于的,求得取平均。
SMO算法
最大化:
限制条件:
[*]
[*]
因为我们要优化的变量()很多,所以我们每次迭代只选择几个变量进行更新;又因为我们的是有约束的优化问题,所以每次更新可能会破坏我们的约束条件,为了不破坏我们的约束条件,我们每次至少选择两个变量进行优化。因此我们每次选择两个变量来进行优化。
两个变量二次规划的求解过程
[*]选择两个变量,其他变量固定
[*]SMO将对偶问题转化成一系列子问题
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%3Cbr%3E%5Cmin+_%7B%5Calpha_%7B1%7D%2C+%5Calpha_%7B2%7D%7D+%26+W%5Cleft%28%5Calpha_%7B1%7D%2C+%5Calpha_%7B2%7D%5Cright%29%3D%5Cfrac%7B1%7D%7B2%7D+K_%7B11%7D+%5Calpha_%7B1%7D%5E%7B2%7D%2B%5Cfrac%7B1%7D%7B2%7D+K_%7B22%7D+%5Calpha_%7B2%7D%5E%7B2%7D%2By_%7B1%7D+y_%7B2%7D+K_%7B12%7D+%5Calpha_%7B1%7D+%5Calpha_%7B2%7D+%5C%5C%3Cbr%3E%3Cbr%3E%26+-%5Cleft%28%5Calpha_%7B1%7D%2B%5Calpha_%7B2%7D%5Cright%29%2By_%7B1%7D+%5Calpha_%7B1%7D+%5Csum_%7Bi%3D3%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D+K_%7Bi+1%7D%2By_%7B2%7D+%5Calpha_%7B2%7D+%5Csum_%7Bi%3D3%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D+K_%7Bi+2%7D+%5C%5C%3Cbr%3E%3Cbr%3E%5Ctext+%7B+s.t.+%7D+%26+%5Calpha_%7B1%7D+y_%7B1%7D%2B%5Calpha_%7B2%7D+y_%7B2%7D%3D-%5Csum_%7Bi%3D3%7D%5E%7BN%7D+y_%7Bi%7D+%5Calpha_%7Bi%7D%3D%5Czeta+%5C%5C%3Cbr%3E%3Cbr%3E%26+0+%5Cleq+%5Calpha_%7Bi%7D+%5Cleq+C%2C+i%3D1%2C2%3Cbr%3E%5Cend%7Baligned%7D
[*]根据约束条件,可以表示为的函数
[*]优化问题有解析解
[*]基于初始可行解https://www.zhihu.com/equation?tex=%5Calpha_1%5E%7Bold%7D%2C%5Calpha_2%5E%7Bold%7D,可以得到
两个变量,约束条件用二维空间中的图形表示:
下面首先考虑第一种情况,根据不等式条件的取值范围:
其中
https://www.zhihu.com/equation?tex=L+%3D+%5Cmax%280%2C%5Calpha_2%5E%7Bold%7D-%5Calpha_1%5E%7Bold%7D%29%5Cquad+H+%3D+%5Cmin%28C%2CC%2B%5Calpha_2%5E%7Bold%7D-%5Calpha_1%5E%7Bold%7D%29
同理对于第二种情况,根据不等式条件的取值范围:
其中
https://www.zhihu.com/equation?tex=L+%3D+%5Cmax%280%2C%5Calpha_2%5E%7Bold%7D%2B%5Calpha_1%5E%7Bold%7D-C%29%5Cquad+H+%3D+%5Cmin%28C%2C%5Calpha_2%5E%7Bold%7D%2B%5Calpha_1%5E%7Bold%7D%29
下面开始求解,求得的过程为:
[*]先求沿着约束方向未经剪辑时的https://www.zhihu.com/equation?tex=%5Calpha_2%5E%7Bnew%2Cunc%7D
[*]再求剪辑后的
我们记
https://www.zhihu.com/equation?tex=g%28x%29%3D%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_iK%28x_i%2Cx%29%2Bb,即我们的判别表达式,令:
为输入的预测值和真实输出https://www.zhihu.com/equation?tex=y的差。
为了简便,引进记号:
https://www.zhihu.com/equation?tex=v_i+%3D+%5Csum_%7Bj%3D3%7D%5EN%5Calpha_jy_jK%28x_i%2Cx_j%29+%3D+g%28x_i%29+-+%5Csum_%7Bj%3D1%7D%5E2%5Calpha_jy_jK%28x_i%2Cx_j%29-b
目标函数写成:
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%3Cbr%3EW%5Cleft%28%5Calpha_%7B1%7D%2C+%5Calpha_%7B2%7D%5Cright%29%3D%26+%5Cfrac%7B1%7D%7B2%7D+K_%7B11%7D+%5Calpha_%7B1%7D%5E%7B2%7D%2B%5Cfrac%7B1%7D%7B2%7D+K_%7B22%7D+%5Calpha_%7B2%7D%5E%7B2%7D%2By_%7B1%7D+y_%7B2%7D+K_%7B12%7D+%5Calpha_%7B1%7D+%5Calpha_%7B2%7D+%5C%5C%3Cbr%3E%3Cbr%3E%26-%5Cleft%28%5Calpha_%7B1%7D%2B%5Calpha_%7B2%7D%5Cright%29%2By_%7B1%7D+v_%7B1%7D+%5Calpha_%7B1%7D%2By_%7B2%7D+v_%7B2%7D+%5Calpha_%7B2%7D%3Cbr%3E%3Cbr%3E%5Cend%7Baligned%7D
由https://www.zhihu.com/equation?tex=%5Calpha_1y_1+%3D+%5Czeta-%5Calpha_2y_2及https://www.zhihu.com/equation?tex=y_i%5E2%3D1,我们得https://www.zhihu.com/equation?tex=%5Calpha_1+%3D+%28%5Czeta-y_2%5Calpha_2%29y_1,代入上式得到只是的函数的目标函数:
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3EW%28%5Calpha_2%29+%26%3D+%5Cfrac%7B1%7D%7B2%7DK_%7B11%7D%28%5Czeta-%5Calpha_2y_2%29%5E2+%2B+%5Cfrac%7B1%7D%7B2%7DK_%7B22%7D%5Calpha_2%5E2%2By_2K_%7B12%7D%28%5Czeta-%5Calpha_2y_2%29%5Calpha_2%5C%5C%3Cbr%3E%26-%28%5Czeta-%5Calpha_2y_2%29y_1-%5Calpha_2%2Bv_1%28%5Czeta-%5Calpha_2y_2%29%2By_2v_2%5Calpha_2%3Cbr%3E%5Cend%7Baligned%7D
对求导并令其等于,得:
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%3Cbr%3E%26%5Cleft%28K_%7B11%7D%2BK_%7B22%7D-2+K_%7B12%7D%5Cright%29+%5Calpha_%7B2%7D%3Dy_%7B2%7D%5Cleft%28y_%7B2%7D-y_%7B1%7D%2B%5Czeta+K_%7B11%7D-%5Czeta+K_%7B12%7D%2Bv_%7B1%7D-v_%7B2%7D%5Cright%29+%5C%5C%3Cbr%3E%3Cbr%3E%26%3Dy_%7B2%7D%5Cleft%5By_%7B2%7D-y_%7B1%7D%2B%5Czeta+K_%7B11%7D-%5Czeta+K_%7B12%7D%2B%5Cleft%28g%5Cleft%28x_%7B1%7D%5Cright%29-%5Csum_%7Bj%3D1%7D%5E%7B2%7D+y_%7Bj%7D+%5Calpha_%7Bj%7D+K_%7B1+j%7D-b%5Cright%29-%5Cleft%28g%5Cleft%28x_%7B2%7D%5Cright%29-%5Csum_%7Bj%3D1%7D%5E%7B2%7D+y_%7Bj%7D+%5Calpha_%7Bj%7D+K_%7B2+j%7D-b%5Cright%29%5Cright%5D%3Cbr%3E%3Cbr%3E%5Cend%7Baligned%7D
将https://www.zhihu.com/equation?tex=%5Czeta%3D%5Calpha_1%5E%7Bold%7Dy_1%2B%5Calpha_2%5E%7Bold%7Dy_2代入:
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%3Cbr%3E%5Cleft%28K_%7B11%7D%2BK_%7B22%7D-2+K+12%5Cright%29+%5Calpha_%7B2%7D%5E%7Bn+e+w%2C+u+n+c%7D+%26%5Cleft.%3Dy_%7B2%7D%5Cleft%28%5Cleft%28K_%7B11%7D%2BK_%7B22%7D-2+K_%7B12%7D%5Cright%29+%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D+y_%7B2%7D%2By_%7B2%7D-y_%7B1%7D%2Bg%5Cleft%28x_%7B1%7D%5Cright%29-g%5Cleft%28x_%7B2%7D%5Cright%29%5Cright%29%5Cright%29+%5C%5C%3Cbr%3E%3Cbr%3E%26%3D%5Cleft%28K_%7B11%7D%2BK_%7B22%7D-2+K_%7B12%7D%5Cright%29+%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%2By_%7B2%7D%5Cleft%28E_%7B1%7D-E_%7B2%7D%5Cright%29%3Cbr%3E%3Cbr%3E%5Cend%7Baligned%7D
将https://www.zhihu.com/equation?tex=%5Ceta+%3D+K_%7B11%7D%2BK_%7B22%7D-2K_%7B12%7D代入:
https://www.zhihu.com/equation?tex=%5Calpha_2%5E%7Bnew%2Cunc%7D+%3D+%5Calpha_2%5E%7Bold%7D%2B%5Cfrac%7By_2%28E_1-E_2%29%7D%7B%5Ceta%7D
我们之后对解进行剪辑:
https://www.zhihu.com/equation?tex=%5Cbegin%7Bcases%7D%3Cbr%3EH%2C%5Cquad+%26%5Calpha_2%5E%7Bnew%2Cunc%7D%3EH%5C%5C%3Cbr%3E%5Calpha_2%5E%7Bnew%2Cunc%7D%2C%5Cquad%26L%5Cle+%5Calpha_2%5E%7Bnew%2Cunc%7D%5Cle+H%5C%5C%3Cbr%3EL%2C%5Cquad%26+%5Calpha_2%5E%7Bnew%2Cunc%7D%3CL%3Cbr%3E%5Cend%7Bcases%7D
得到的解:
https://www.zhihu.com/equation?tex=%5Calpha_1%5E%7Bnew%7D+%3D+%5Calpha_1%5E%7Bold%7D%2By_1y_2%28%5Calpha_2%5E%7Bold%7D-%5Calpha_2%5E%7Bnew%7D%29
关于KKT条件,我们有:
https://www.zhihu.com/equation?tex=%5Cbegin%7Barray%7D%7Br%7D%3Cbr%3E%3Cbr%3E%5Calpha_%7Bi%7D%3D0+%5CLeftrightarrow+y_%7Bi%7D+g%5Cleft%28x_%7Bi%7D%5Cright%29+%5Cgeqslant+1+%5C%5C%3Cbr%3E%3Cbr%3E0%3C%5Calpha_%7Bi%7D%3CC+%5CLeftrightarrow+y_%7Bi%7D+g%5Cleft%28x_%7Bi%7D%5Cright%29%3D1+%5C%5C%3Cbr%3E%3Cbr%3E%5Calpha_%7Bi%7D%3DC+%5CLeftrightarrow+y_%7Bi%7D+g%5Cleft%28x_%7Bi%7D%5Cright%29+%5Cleqslant+1%3Cbr%3E%3Cbr%3E%5Cend%7Barray%7D
下面我们计算阈值和
由KKT条件,如果https://www.zhihu.com/equation?tex=0%3C%5Calpha_1%5E%7Bnew%7D%3CC,则
https://www.zhihu.com/equation?tex=%5Csum_%7Bi%3D1%7D%5EN%5Calpha_iy_iK_%7Bi1%7D%2Bb%3Dy_1
https://www.zhihu.com/equation?tex=b_1%5E%7Bnew%7D+%3D+y_1-%5Csum_%7Bi%3D3%7D%5EN%5Calpha_iy_iK_%7Bi1%7D-%5Calpha_1%5E%7Bnew%7Dy_1K_%7B11%7D-%5Calpha_2%5E%7Bnew%7Dy_2K_%7B21%7D
https://www.zhihu.com/equation?tex=E_1+%3D+%5Csum_%7Bi%3D3%7D%5EN%5Calpha_iy_iK_%7Bi1%7D%2B%5Calpha_1%5E%7Bold%7Dy_1K_%7B11%7D%2B%5Calpha_2%5E%7Bold%7Dy_2K_%7B21%7D%2Bb%5E%7Bold%7D-y_1
https://www.zhihu.com/equation?tex=E_1的表达式与https://www.zhihu.com/equation?tex=b_1%5E%7Bnew%7D相结合,得:
https://www.zhihu.com/equation?tex=b_1%5E%7Bnew%7D+%3D+-E_1-y_1K_%7B11%7D%28%5Calpha_1%5E%7Bnew%7D-%5Calpha_1%5E%7Bold%7D%29-y_2K_%7B21%7D%28%5Calpha_2%5E%7Bnew%7D-%5Calpha_2%5E%7Bold%7D%29%2Bb%5E%7Bold%7D
同理,如果https://www.zhihu.com/equation?tex=0%3C%5Calpha_2%5E%7Bnew%7D%3CC,则
https://www.zhihu.com/equation?tex=%5Cbegin%7Baligned%7D%3Cbr%3E%3Cbr%3E%260%3C%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3CC+%5C%5C%3Cbr%3E%3Cbr%3E%26b_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3D-E_%7B2%7D-y_%7B1%7D+K_%7B12%7D%5Cleft%28%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bnew+%7D%7D-%5Calpha_%7B1%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29-y_%7B2%7D+K_%7B22%7D%5Cleft%28%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bnew+%7D%7D-%5Calpha_%7B2%7D%5E%7B%5Ctext+%7Bold+%7D%7D%5Cright%29%2Bb%5E%7B%5Ctext+%7Bold+%7D%7D+%5C%5C%3Cbr%3E%3Cbr%3E%26E_%7Bi%7D%5E%7B%5Ctext+%7Bnew+%7D%7D%3D%5Csum_%7BS%7D+y_%7Bj%7D+%5Calpha_%7Bj%7D+K%5Cleft%28x_%7Bi%7D%2C+x_%7Bj%7D%5Cright%29%2Bb%5E%7B%5Ctext+%7Bnew+%7D%7D-y_%7Bi%7D%3Cbr%3E%3Cbr%3E%5Cend%7Baligned%7D
如果同时满足条件https://www.zhihu.com/equation?tex=0%3C%5Calpha_i%5E%7Bnew%7D%3CC,那么https://www.zhihu.com/equation?tex=b_1%5E%7Bnew%7D%3Db_2%5E%7Bnew%7D。如果是或者https://www.zhihu.com/equation?tex=C,那么https://www.zhihu.com/equation?tex=b_1%5E%7Bnew%7D%2Cb_2%5E%7Bnew%7D以及它们之间的数都是符合KKT条件的阈值,这时选择它们的中点作为。
在每次完成两个变量的优化之后,还必须更新对应的值,并将它们保存在列表中。值的更新要用到值,以及所有支持向量对应的https://www.zhihu.com/equation?tex=%5Calpha_j:
https://www.zhihu.com/equation?tex=E_i%5E%7Bnew%7D+%3D+%5Csum_%7BS%7Dy_ja_jK%28x_i%2Cx_j%29%2Bb%5E%7Bnew%7D-y_i
其中,https://www.zhihu.com/equation?tex=S是所有支持向量的集合。
关于此方面的解释
变量的启发式选择
SMO算法在每个子问题中选择两个变量优化,其中至少一个变量是违反KKT条件的。
[*]第一个变量的选择:外循环
[*]违反KKT条件最严重的样本点
[*]检验样本点是否满足KKT条件:
[*]https://www.zhihu.com/equation?tex=%5Cbegin%7Barray%7D%7Br%7D+%5Calpha_%7Bi%7D%3D0+%5CLeftrightarrow+y_%7Bi%7D+g%5Cleft%28x_%7Bi%7D%5Cright%29+%5Cgeqslant+1+%5C%5C+0%3C%5Calpha_%7Bi%7D%3CC%5CLeftrightarrow+y_%7Bi%7D+g%5Cleft%28x_%7Bi%7D%5Cright%29%3D1+%5C%5C+%5Calpha_%7Bi%7D%3DC+%5CLeftrightarrow+y_%7Bi%7D+g%5Cleft%28x_%7Bi%7D%5Cright%29+%5Cleqslant+1%5Cend%7Barray%7D\end{array}$$
该检验是在https://www.zhihu.com/equation?tex=%5Cepsilon范围内进行的。在检验过程中,外层循环首先遍历满足条件https://www.zhihu.com/equation?tex=0%3C%5Calpha_i%3CC的样本点,即在间隔边界上的支持向量点,检验它们是否满足KKT条件,如果这些样本点都满足KKT条件,那么遍历整个训练集,检验他们是否满足KKT条件。
[*]第二个变量的检查:内循环
[*]选择的标准是希望能使目标函数有足够大的变化
[*]即对应https://www.zhihu.com/equation?tex=%7CE_1-E_2%7C最大
[*]如果内循环通过上述方法找到的点不能使目标函数有足够大的下降,则:遍历间隔边界上的样本点,测试目标函数下降
[*]如果下降不大,则遍历所有样本点
[*]如果依然下降不大,则丢弃外循环点,重新选择
算法实现
def kernal(a, b):
# 高斯核, s为方差
return a.dot(b)
class SVM:
def __init__(self,data,label):
self.data = data
self.label = label
self.len = data.shape
self.E = np.zeros((self.len,1))
self.alpha = np.random.random((self.len,1))
self.b = 0
def f(self,i):
# 传入索引
s = 0
for k in range(self.len):
s += self.alpha*self.label*kernal(self.data,self.data)
return s + self.b
def error(self):
# 计算E
for j in range(self.len):
E = 0
for i in range(self.len):
E += self.alpha*self.label*kernal(self.data,self.data)
self.E = E + self.b - self.label
def bound(self, i,j,alpha_i, alpha_j, C):
# 传入两个索引,为需要优化的alpha的索引
# 求解alpha_j的范围
# C为正则化项系数
if self.label != self.label:
L, H = np.max(), np.min()
else:
L, H = np.max(), np.min()
return (L,H)
def update(self, i,j, C):
# 更新alpha和b
# 传入索引
eta = kernal(self.data,self.data) + kernal(self.data,self.data) - 2*kernal(self.data,self.data)
# 下面需要补充逻辑关系
alpha_old_i = self.alpha
alpha_old_j = self.alpha
self.alpha = self.alpha + self.label*(self.E-self.E)/eta
L, H = self.bound(i,j,alpha_old_i,alpha_old_j,C)
if self.alpha >= H:
self.alpha = H
elif self.alpha <= L:
self.alpha = L
else:
self.alpha = self.alpha
self.alpha = self.alpha + self.label*self.label*(alpha_old_j-self.alpha)
b1 = self.b - self.E - self.label*(self.alpha-alpha_old_i)*kernal(self.data,self.data)-self.label*(self.alpha-alpha_old_j)*kernal(self.data,self.data)
b2 = self.b - self.E - self.label*(self.alpha-alpha_old_i)*kernal(self.data,self.data)-self.label*(self.alpha-alpha_old_j)*kernal(self.data,self.data)
self.b = (b1+b2)/2
self.error()
def smo(self, epsilon, max_iter, C):
for k in range(max_iter):
I = np.intersect1d(np.argwhere(self.alpha>0),np.argwhere(self.alpha<C))
d = []
for i in I:
d.append(np.abs(self.label*self.f(i)-1))
i = np.argmax(np.array(d))
j = np.argmax(self.E-self.E)
self.update(i,j,C)
ge = np.array(>=1])
eq = np.array(==1])
le = np.array(<=1])
Ge = []
Eq = []
Le = []
for ig in ge:
Ge.appned(self.label*self.f(ig))
for ie in eq:
Eq.append(self.label*self.f(ie))
for il in le:
Le.append(self.label*self.f(il))
Ge = np.array(Ge)
Eq = np.array(Eq)
Le = np.array(Le)
ne = np.sum(np.abs(Eq-1)>epsilon)
ng1 = np.sum(np.abs(Ge-1)<0)
ng2 = np.sum(np.abs(Ge-1)>epsilon)
nl1 = np.sum(np.abs(Le-1)>0)
nl2 = np.sum(np.abs(Le-1)>epsilon)
if (ne+ng1+ng2+nl1+nl2)==0:
break
def predict(self, x):
s = 0
for k in range(self.len):
s += self.alpha*self.label*kernal(self.data,x)
s = s + self.b
if s >=0:
return 1
else:
return -1
本文使用 Zhihu On VSCode 创作并发布
页:
[1]