深度学习算法优化系列十六 | OpenVINO Post-Training ...

kyuskoj · 发表于 2021-12-20 13:20

这是OpenVINO 2020 Post-Training Optimization Toolkit INT8量化工具的原理介绍和使用教程的翻译，原文档地址为：http://docs.openvinotoolkit.org/latest/_README.html

Quantization

这个工具的主要功能是一个统一的量化工具。通常，此方法支持任意Bit(>=2)来表示权重和激活值。在量化过程中，会根据预先定义的硬件目标将FakeQuantize操作自动插入到模型图中，以生成硬件友好的优化模型。然后，不同的量化算法可以调整FakeQuantize参数或删除一些操作以满足精度标准。最后这个伪量化模型可以在运行时被解释并将其转换为真正的低精度模型，从而获得真正的性能改善。
量化算法

该工具包提供了多种量化和辅助算法来帮助量化权重和激活图后的模型恢复精度。潜在地，算法可以形成独立的优化流水线去优化一个或者多个模型。但是，我们仅对以下两种用于8Bit量化的算法进行了验证，建议将其用于获得DNN模型量化稳定和可靠结果的方案。

DefaultQuantization 用作默认方法以获得快速并且在大多数情况下比较准确的int8量化结果。
AccuracyAwareQuantization 允许在量化后精度下降在预定的范围内，同事牺牲一定的性能提升。可能需要更多的时间量化。

量化准则

量化是由量化范围和量化级数来参数化的。采样公式如下：

其中input_low和input_high代表量化范围，

代表四舍五入到最接近的整数。
对称量化

该公式由在量化过程中调整的scale参数来参数化：

在上面的规则中，level_low和level_high代表离散数值的范围。

对于权重：

对于无符号激活值：

对于带符号激活值：

非对称量化

量化公式由作为可调参数的input_low和input_range参数化：

对于权重和激活图下面的量化模式被应用：

DefaultQuantization

DefaultQuantization算法旨在执行快速且准确的神经网络的INT8量化。它包含三种依次应用给模型的算法：

ActivationChannelAlignment 用作量化之前的预备步骤，并允许你调整卷积层的输出激活范围，以减少量化误差。
MinMaxQuantization 这是一种原始的量化方法，可根据指定的目标硬件自动将FakeQuantize操作插入模型图中，并使用在校准数据集上收集的统计信息将其初始化。
BiasCorrection 基于卷积层和全连接层的量化误差来调整该层的偏置，以使整体误差无偏。

该算法使用两阶段的统计信息收集程序，因此量化的间隔时间基本上取决于用于它的校准子集的大小。
参数

该算法接受它所依赖的三种算法引入的所有参数。所有这些参数可以大致分为两组：必选和可选。

必选参数包括以下示例中描述的少量参数：

&#34;name&#34;: &#34;DefaultQuantization&#34;, // optimization algorithm name
&#34;params&#34;: {
&#34;preset&#34;: &#34;performance&#34;, // Preset [performance (default), accuracy] which controls the quantization mode (symmetric and asymmetric respectively)
&#34;stat_subset_size&#34;: 300, // Size of subset to calculate activations statistics used for quantization. The whole dataset is used if no parameter specified
}

所有其他选项都可以视为高级模式，并且需要对量化过程有深入的了解。以下是所有可能参数的整体说明：

&#34;name&#34;: &#34;DefaultQuantization&#34;, // optimization algorithm name
&#34;params&#34;: {
      /* Preset is a collection of optimization algorithm parameters that will specify to the algorithm
      to improve which metric the algorithm needs to concentrate. Each optimization algorithm supports
      [performance, accuracy] presets which control the quantization mode (symmetric and asymmetric respectively)*/
      &#34;preset&#34;: &#34;accuracy&#34;,
      &#34;stat_subset_size&#34;: 300, // Size of subset to calculate activations statistics that can be used
                              // For quantization parameters calculation.
      &#34;ignored&#34;: {
         &#34;scope&#34;: [
            &#34;<NODE_NAME>&#34; // List of nodes that are excluded from optimization
         ],
         &#34;operations&#34;: [ // List of types that are excluded from optimization
            {
                  &#34;type&#34;: &#34;<NODE_TYPE>&#34;, // Type of ignored operation
                  &#34;attributes&#34;: { // If attributes are defined they will be considered during the ignorance
                     &#34;<NAME>&#34;: &#34;<VALUE>&#34; // Lists of values to filter by
                  }
            }
         ]
      },
      /* Manually specified quantization parameters */
      /* Quantization parameters for weights */
      &#34;weights&#34;: {  // Weights quantization parameters used by MinMaxAlgorithm
         &#34;bits&#34;: 8, // Bit-width, default is 8
         &#34;mode&#34;: &#34;symmetric&#34;, // Quantization mode, default is &#34;symmetric&#34;
         &#34;level_low&#34;: 0,    // Minimum level in the integer range in which we quantize to, default is 0 for unsigned range, -2^(bit-1) - for signed
         &#34;level_high&#34;: 255, // Maximum level in the integer range in which we quantize to, default is 2^bits-1 for unsigned range, 2^(bit-1)-1 - for signed
         &#34;granularity&#34;: &#34;perchannel&#34;, // Quantization scale granularity: [&#34;pertensor&#34; (default), &#34;perchannel&#34;]
         &#34;range_estimator&#34;: {       // Range estimator that is used to get the quantization ranges and filter outliers based on the statistics
            &#34;max&#34;: {                // Parameters to estimate top quantization border
                  &#34;type&#34;: &#34;quantile&#34;, // Estimator type: [&#34;max&#34; (default), &#34;quantile&#34;]
                  &#34;outlier_prob&#34;: 0.0001 // Outlier probability used in the &#34;quantile&#34; estimator
            },
            &#34;min&#34;: {                // Parameters to estimate bottom quantization border (used only in asymmetric mode)
                  &#34;type&#34;: &#34;quantile&#34;, // Estimator type: [&#34;max&#34; (default), &#34;quantile&#34;]
                  &#34;outlier_prob&#34;: 0.0001 // Outlier probability used in the &#34;quantile&#34; estimator
            }
         }
      },
      /* Quantization parameters for activations */
      &#34;activations&#34;: {
         &#34;range_estimator&#34;: {          // Range estimator that is used to get the quantization ranges and filter outliers based on the statistics
            &#34;preset&#34;: &#34;quantile&#34;,
            /* OR */
            /* minimum of quantization range */
            /* maximum of quantization range */
            &#34;max&#34;: {                // Parameters to estimate top quantization border
                  &#34;aggregator&#34;: &#34;mean&#34;,  // Batch aggregation type: [&#34;mean&#34; (default), &#34;max&#34;, &#34;min&#34;, &#34;median&#34;, &#34;mean_no_outliers&#34;, &#34;median_no_outliers&#34;, &#34;hl_estimator&#34;]
                  &#34;type&#34;: &#34;quantile&#34;, // Estimator type: [&#34;max&#34; (default), &#34;quantile&#34;]
                  &#34;outlier_prob&#34;: 0.0001 // Outlier probability used in the &#34;quantile&#34; estimator
            },
            &#34;min&#34;: {                // Parameters to estimate top quantization border
                  &#34;aggregator&#34;: &#34;mean&#34;,  // Batch aggregation type: [&#34;mean&#34; (default), &#34;max&#34;, &#34;min&#34;, &#34;median&#34;, &#34;mean_no_outliers&#34;, &#34;median_no_outliers&#34;, &#34;hl_estimator&#34;]
                  &#34;type&#34;: &#34;quantile&#34;, // Estimator type [min, max, abs_max, quantile, abs_quantile]
                  &#34;outlier_prob&#34;: 0.0001 // Outlier probability used in the &#34;quantile&#34; estimator
            }
         }
      }
}AccuracyAwareQuantization

概述

AccuracyAware算法旨在执行精确的Int8量化，并允许模型在保持精度下降的预定范围内如1%。和DefaultQuantization算法相比，这可能会导致性能下降，因为某些层可以被还原为原始精度。通常该算法包含以下步骤。

使用DefaultQuantization算法对模型进行完全量化。
在验证集的子集上比较量化模型和全精度模型，以便找到目标精度度量中的不匹配项。基于不匹配项提取排名子集。
为了获得每个量化层对精度下降的贡献，执行了逐层排名。
根据排名，最”有问题“的层将被还原为原始精度。进行这个更改之后，将对完整验证集上获得的模型进行评估，以获取新的精度下降。
如果所有预定义精度指标均满足，则算法结束。否则，它将继续还原下一个“有问题”层。
某次恢复可能无法获得任何准确性的提高，甚至会降低准确性。然后按步骤3中所述重新排名。

参数

由于DefaultQuantization算法用作初始化，因此它的所有参数也是有效的并且可以指定。在这里，我们仅描述AccuracyAware特定参数：
&#34;name&#34;: &#34;AccuracyAwareQuantization&#34;, // compression algorithm name
&#34;params&#34;: {
      &#34;metric_subset_ratio&#34;: 0.5, // A part of the validation set that is used to compare full-precision and quantized models
      &#34;ranking_subset_size&#34;: 300, // A size of a subset which is used to rank layers by their contribution to the accuracy drop
      &#34;max_iter_num&#34;: maxsize, // Maximum number of iterations of the algorithm (maximum of layers that may be reverted back to full-precision)
      &#34;maximal_drop&#34;: 0.005,    // Maximum accuracy drop which has to be achieved after the quantization
      &#34;drop_type&#34;: &#34;absolute&#34;, // Drop type of the accuracy metric: relative or absolute (default)
      &#34;use_prev_if_drop_increase&#34;: false,    // Whether to use NN snapshot from the previous algorithm iteration in case if drop increases
      &#34;base_algorithm&#34;: &#34;DefaultQuantization&#34; // Base algorithm that is used to quantize model at the beginning
}Post-training Optimization Toolkit API

该工具包提供了通过API使用优化算法的功能。这意味着用户需要将优化代码嵌入到其自己的推理管道中，该管道通常是用于全精度模型的模型验证脚本。在这里，我们描述了如何将其嵌入ImageNet分类任务的示例。
为了使用优化功能，应实现优化过程所需的以下接口：

引擎：自定义引擎类，允许进行模型推断。我们基于DLDT IE异步API创建了此类，该类也可以在用户应用程序中重用。可以在压缩目录的engines文件夹中找到此引擎的示例。
数据加载器 ：负责校准数据集的加载。在示例文件夹中可以找到ImageNet DataLoader的示例。
评价方式：如果使用准确性感知优化方法（例如AccuracyAwareQuantization算法）并实现准确性度量计算，则需要使用此方法。可以在示例文件夹中找到“Top 1精确度”度量标准的示例。
Loss：仅在优化方法需要按样本损失计算时使用。

Sample演示了分类模型的量化，并使用上述API实施，可以在Sample文件夹中找到。
如何运行一个例子

在下面的命令中，训练后优化工具目录<INSTALL_DIR>/deployment_tools/tools/post_training_optimization_toolkit被当作<POT_DIR>。<INSTALL_DIR>是你的安装目录。

cd <POT_DIR>/libs/open_model_zoo/tools/downloader

启动下载程序工具以从Open Model Zoo存储库下载模型

python3 downloader.py --name <MODEL_NAME>

启动转换器工具以生成IRv10模型

python3 converter.py --name <MODEL_NAME> --mo <PATH_TO_MODEL_OPTIMIZER>/mo.py

移动到sample文件夹并启动示例脚本

cd <POT_DIR>/sample
python3 sample.py -m <PATH_TO_IR_XML> -a <IMAGENET_ANNOTATION_FILE> -d <IMAGENER_IMAGES>可选的：你可以使用-w，--weights来指定权重的目录。
定义配置文件

这个工具包被设计为与配置文件一起使用，其中指定了优化所需要的所有参数。这些参数被组织为字典，并存储在JSON文件中。JSON文件允许使用jstylesonPython包支持的注释。逻辑上，所有的参数都分为3组：

模型参数 和模型定义相关的参数(例如模型名字，模型路径等等)
引擎参数 定义引擎的参数，该引擎负责用于优化和评估的模型推断和数据准备（例如预处理参数，数据集路径等）
压缩参数 与优化算法相关的信息（例如算法名称和特定参数）

模型参数

本节仅包含3个参数：

&#34;model_name&#34; 模型名字，例如&#34;MobileNetV2&#34;
&#34;model&#34; 字符串参数，用于定义输入模型拓扑(.xml)的路径
&#34;weights&#34; 字符串参数，用于定义输入模型权重（.bin）的路径

引擎参数

当使用数据集进行DL模型推断时，该工具包依赖于深度学习准确性验证框架(AccuracyChecker)。因此，有两种方法可以定义其参数：

请参考由YAML文件表示的现有AccuracyChecker配置文件。它可以是用于精确模型验证的文件。在这种情况下，仅应定义一个参数：
- &#34;config&#34; - AccuracyChecker配置文件的路径
直接在JSON文件中定义所有必需的AccuracyChecker参数。有关更多详细信息，请参考相应的AccuracyChecker信息或工具包示例中提供的此类配置文件的示例之一。

压缩参数

本节定义优化算法及其参数。上面已经讲过了。
配置文件示例

为了快速入门，提供了一些流行的DL模型的配置文件示例。这些配置文件位于如下目录<INSTALL_DIR>/deployment_tools/tools/post_training_optimization_toolkit/configs/examples，其中<INSTALL_DIR>是OpenVINO的安装工具。有关如何使用示例配置文件运行训练后量化工具的详细信息请看下节。
运行例子

请按照以下步骤，使用随英特尔OpenVINO工具包发行包一起提供的示例配置文件之一，运行训练后量化工具。
在下面的命令中，训练后优化工具目录<INSTALL_DIR>/deployment_tools/tools/post_training_optimization_toolkit被当作<POT_DIR>。<INSTALL_DIR>是你的安装目录。
示例配置文件位于<POT_DIR>/configs/examples目录下。

cd <POT_DIR>/libs/open_model_zoo/tools/downloader

启动下载程序工具以从Open Model Zoo存储库下载模型

python3 downloader.py --name <MODEL_NAME>

启动转换器工具以生成IRv10模型

python3 converter.py --name <MODEL_NAME> --mo <PATH_TO_MODEL_OPTIMIZER>/mo.py

更新要启动的示例配置文件中的模型/权重字段。
如果需要，使用AccuracyChecker配置文件的路径更新所需示例配置文件的config字段。如果要使用Open Model Zoo配置文件，请更新数据集定义文件<POT_DIR>/libs/open_model_zoo/tools/accuracy_checker/dataset_definitions.yml。如果你已自定义预定义的引擎部分，则在POT配置需要时覆盖数据集和注释的路径。
更新数据集定义文件<POT_DIR>/libs/open_model_zoo/tools/accuracy_checker/dataset_definitions.yml.与数据集的必要路径（如果您没有预定义的“引擎”部分）
使用<POT_DIR>目录中的配置文件启动训练后量化的工具：

cd <POT_DIR>
python3 main.py -c <PATH_TO_POT_CONFIG>推荐阅读

深度学习算法优化系列四 | 如何使用OpenVINO部署以Mobilenet做Backbone的YOLOv3模型？
YOLOv3-tiny在VS2015上使用Openvino部署
深度学习算法优化系列十四 | OpenVINO Int8量化文档翻译(Calibaration Tool)
深度学习算法优化系列十五 | OpenVINO Int8量化前的数据集转换和精度检查工具文档

<hr/>欢迎关注GiantPandaCV, 在这里你将看到独家的深度学习分享，坚持原创，每天分享我们学习到的新鲜知识。( ω )
有对文章相关的问题，或者想要加入交流群，欢迎添加BBuf微信：
https://u.wechat.com/MPWFDnmCPu6zgf5YUtdpT_U (二维码自动识别)

		自动登录	找回密码
密码			立即注册

深度学习算法优化系列十六 | OpenVINO Post-Training ...

本帖子中包含更多资源