{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 通用量化模型\n", "\n", "```{rubric} 模型对比\n", "```\n", "\n", "类型|大小(MB)|accuracy($\\%$)\n", ":-|:-|:-\n", "浮点|9.188|95.09\n", "浮点融合|8.924|95.09\n", "QAT|2.657|94.86\n", "\n", "```{rubric} 不同 QConfig 的静态 PTQ 模型\n", "```\n", "\n", "accuracy($\\%$)|激活|权重|\n", ":-|:-|:-\n", "|40.51|{data}`~torch.ao.quantization.observer.MinMaxObserver`.`with_args(quant_min=0, quant_max=127)`|{data}`~torch.ao.quantization.observer.MinMaxObserver`.`with_args(dtype=torch.qint8, qscheme=torch.per_tensor_symmetric)`\n", "75.68|{data}`~torch.ao.quantization.observer.HistogramObserver`.`with_args(quant_min=0, quant_max=127)`|{data}`~torch.ao.quantization.observer.PerChannelMinMaxObserver`.`with_args(dtype=torch.qint8, qscheme=torch.per_channel_symmetric)`\n", "\n", "加载一些库:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from torch import nn, jit\n", "from torch.ao.quantization.qconfig import default_qconfig\n", "from torch.ao.quantization.qconfig import get_default_qat_qconfig, get_default_qconfig\n", "from torch.ao.quantization.quantize import prepare, convert, prepare_qat\n", "import torch\n", "\n", "# 设置 warnings\n", "import warnings\n", "warnings.filterwarnings(\n", " action='ignore',\n", " category=DeprecationWarning,\n", " module='.*'\n", ")\n", "warnings.filterwarnings(\n", " action='ignore',\n", " module='torch.ao.quantization'\n", ")\n", "# 载入自定义模块\n", "from mod import load_mod\n", "load_mod()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "设置模型保存路径和超参数:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "saved_model_dir = 'models/'\n", "float_model_file = 'mobilenet_pretrained_float.pth'\n", "scripted_float_model_file = 'mobilenet_float_scripted.pth'\n", "scripted_ptq_model_file = 'mobilenet_ptq_scripted.pth'\n", "scripted_quantized_model_file = 'mobilenet_quantization_scripted_quantized.pth'\n", "scripted_qat_model_file = 'mobilenet_qat_scripted_quantized.pth'\n", "# 超参数\n", "learning_rate = 5e-5\n", "num_epochs = 30\n", "batch_size = 16\n", "num_classes = 10\n", "# train_batch_size = 30\n", "# eval_batch_size = 50\n", "\n", "# 设置评估策略\n", "criterion = nn.CrossEntropyLoss()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 辅助函数\n", "\n", "接下来,我们定义几个[帮助函数](https://github.com/pytorch/examples/blob/master/imagenet/main.py)来帮助评估模型。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from helper import evaluate, print_size_of_model, load_model\n", "\n", "\n", "def print_info(model, model_type='浮点模型'):\n", " '''打印信息'''\n", " print_size_of_model(model)\n", " top1, top5 = evaluate(model, criterion, test_iter)\n", " print(f'\\n{model_type}:\\n\\t'\n", " f'在 {num_eval} 张图片上评估 accuracy 为: {top1.avg:2.5f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 定义数据集和数据加载器\n", "\n", "作为最后一个主要的设置步骤,我们为训练和测试集定义了数据加载器。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from xinet import CV\n", "\n", "# 为了 cifar10 匹配 ImageNet,需要将其 resize 到 224\n", "train_iter, test_iter = CV.load_data_cifar10(batch_size=batch_size,\n", " resize=224)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "查看数据集的 batch 次数:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "训练、测试批次分别为: 3125 625\n" ] } ], "source": [ "print('训练、测试批次分别为:',\n", " len(train_iter), len(test_iter))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "获取训练和测试数据集的大小:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(50000, 10000)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "num_train = sum(len(ys) for _, ys in train_iter)\n", "num_eval = sum(len(ys) for _, ys in test_iter)\n", "num_train, num_eval" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 微调浮点模型\n", "\n", "配置浮点模型:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "from torchvision.models.quantization import mobilenet_v2\n", "\n", "# 定义模型\n", "def create_model(quantize=False,\n", " num_classes=10,\n", " pretrained=False):\n", " float_model = mobilenet_v2(pretrained=pretrained,\n", " quantize=quantize)\n", " # 匹配 ``num_classes``\n", " float_model.classifier[1] = nn.Linear(float_model.last_channel,\n", " num_classes)\n", " return float_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "定义模型:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "float_model = create_model(pretrained=True,\n", " quantize=False,\n", " num_classes=num_classes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "微调浮点模型:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loss 0.012, train acc 0.996, test acc 0.951\n", "338.8 examples/sec on cuda:0\n" ] }, { "data": { "image/svg+xml": "\n\n\n \n \n \n \n 2022-03-23T15:22:42.159935\n image/svg+xml\n \n \n Matplotlib v3.4.0, https://matplotlib.org/\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "CV.train_fine_tuning(float_model, train_iter, test_iter,\n", " learning_rate=learning_rate,\n", " num_epochs=num_epochs,\n", " device='cuda:0',\n", " param_group=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "保存模型:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "torch.save(float_model.state_dict(), saved_model_dir + float_model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 配置可量化模型\n", "\n", "加载浮点模型:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "float_model = create_model(quantize=False,\n", " num_classes=num_classes)\n", "float_model = load_model(float_model, saved_model_dir + float_model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "查看浮点模型的信息:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "模型大小:9.187789 MB\n", "\n", "浮点模型:\n", "\t在 10000 张图片上评估 accuracy 为: 95.09000\n" ] } ], "source": [ "print_info(float_model, model_type='浮点模型')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{rubric} 融合模块\n", "```\n", "\n", "融合模块既可以节省内存访问,使模型更快,同时也提高了数值精度。虽然这可以用于任何模型,但这在量化模型中尤其常见。\n", "\n", "可以先查看融合前的 inverted residual 块:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): ConvNormActivation(\n", " (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)\n", " (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (2): ReLU()\n", " )\n", " (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", ")" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "float_model.features[1].conv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "启用评估模式:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "float_model.eval();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "融合模块:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "float_model.fuse_model()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "查看融合后的 inverted residual 块:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): ConvNormActivation(\n", " (0): ConvReLU2d(\n", " (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)\n", " (1): ReLU()\n", " )\n", " (1): Identity()\n", " (2): Identity()\n", " )\n", " (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1))\n", " (2): Identity()\n", ")" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "float_model.features[1].conv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "最后,为了得到“基线”精度,让我们看看融合模块的非量化模型的精度:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "baseline 模型大小\n", "模型大小:8.923757 MB\n" ] } ], "source": [ "model_type = '融合后的浮点模型'\n", "print(\"baseline 模型大小\")\n", "print_size_of_model(float_model)\n", "\n", "top1, top5 = evaluate(float_model, criterion, test_iter)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "融合后的浮点模型:\n", "\t在 10000 张图片上评估 accuracy 为: 95.090\n" ] } ], "source": [ "print(f'\\n{model_type}:\\n\\t在 {num_eval} 张图片上评估 accuracy 为: {top1.avg:2.3f}')\n", "# 保存\n", "jit.save(jit.script(float_model), saved_model_dir + scripted_float_model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这将是我们进行比较的基准。接下来,让我们尝试不同的量化方法。\n", "\n", "## 静态量化后训练\n", "\n", "训练后的静态量化(Post Training Quantization,简称 PTQ)不仅包括将权重从 float 转换为int,就像在动态量化中一样,还包括执行额外的步骤,即首先通过网络输入一批数据,并计算不同激活的结果分布(具体来说,这是通过在记录数据的不同点插入观测者模块来实现的)。然后使用这些分布来确定如何在推断时量化不同的激活(一种简单的技术是将整个激活范围划分为 256 个级别,但我们也支持更复杂的方法)。重要的是,这个额外的步骤允许我们在运算之间传递量化的值,而不是在每个运算之间将这些值转换为浮点数(然后再转换为整数),从而显著提高了速度。" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# 加载模型\n", "myModel = create_model(pretrained=False,\n", " quantize=False,\n", " num_classes=num_classes)\n", "float_model = load_model(myModel,\n", " saved_model_dir + float_model_file)\n", "myModel.eval()\n", "\n", "# 融合\n", "myModel.fuse_model()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "指定量化配置(从简单的最小/最大范围估计和加权的逐张量量化开始):" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "QConfig(activation=functools.partial(, quant_min=0, quant_max=127){}, weight=functools.partial(, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){})" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myModel.qconfig = default_qconfig\n", "myModel.qconfig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "开始校准准备:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PTQ 准备:插入观测者\n", "\n", " 查看观测者插入后的 inverted residual \n", "\n", " Sequential(\n", " (0): ConvNormActivation(\n", " (0): ConvReLU2d(\n", " (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)\n", " (1): ReLU()\n", " (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)\n", " )\n", " (1): Identity()\n", " (2): Identity()\n", " )\n", " (1): Conv2d(\n", " 32, 16, kernel_size=(1, 1), stride=(1, 1)\n", " (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)\n", " )\n", " (2): Identity()\n", ")\n" ] } ], "source": [ "print('PTQ 准备:插入观测者')\n", "prepare(myModel, inplace=True)\n", "print('\\n 查看观测者插入后的 inverted residual \\n\\n',\n", " myModel.features[1].conv)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "用数据集校准:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "PTQ:校准完成!\n" ] } ], "source": [ "num_calibration_batches = -1 # 取全部训练集做校准\n", "evaluate(myModel, criterion, train_iter, neval_batches=num_calibration_batches)\n", "print('\\nPTQ:校准完成!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "转换为量化模型:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PTQ:转换完成!\n" ] } ], "source": [ "convert(myModel, inplace=True)\n", "print('PTQ:转换完成!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "融合并量化后,查看融合模块的 Inverted Residual 块:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): ConvNormActivation(\n", " (0): QuantizedConvReLU2d(32, 32, kernel_size=(3, 3), stride=(1, 1), scale=0.18101467192173004, zero_point=0, padding=(1, 1), groups=32)\n", " (1): Identity()\n", " (2): Identity()\n", " )\n", " (1): QuantizedConv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), scale=0.2768715023994446, zero_point=81)\n", " (2): Identity()\n", ")" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myModel.features[1].conv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "量化后的模型大小:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "模型大小:2.356113 MB\n" ] } ], "source": [ "print_size_of_model(myModel)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "评估:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "PTQ 模型:\n", "\t在 10000 张图片上评估 accuracy 为: 40.51\n" ] } ], "source": [ "model_type = 'PTQ 模型'\n", "top1, top5 = evaluate(myModel, criterion, test_iter)\n", "print(f'\\n{model_type}:\\n\\t在 {num_eval} 张图片上评估 accuracy 为: {top1.avg:2.2f}')\n", "jit.save(jit.script(float_model), saved_model_dir + scripted_ptq_model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这是因为我们使用了简单的 min/max 观测器来确定量化参数。尽管如此,我们还是将模型的大小减少到了 2.36 MB 以下,几乎减少了 4 倍。\n", "\n", "此外,通过使用不同的量化配置来显著提高精度(对于量化 ARM 架构的推荐配置重复同样的练习)。该配置的操作如下:\n", "\n", "- 在 per-channel 基础上量化权重\n", "- 使用直方图观测器,收集激活的直方图,然后以最佳方式选择量化参数。" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "QConfig(activation=functools.partial(, reduce_range=True){}, weight=functools.partial(, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){})" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "per_channel_quantized_model = create_model(quantize=False,\n", " num_classes=num_classes)\n", "per_channel_quantized_model = load_model(per_channel_quantized_model,\n", " saved_model_dir + float_model_file)\n", "per_channel_quantized_model.eval()\n", "per_channel_quantized_model.fuse_model()\n", "per_channel_quantized_model.qconfig = get_default_qconfig('fbgemm')\n", "per_channel_quantized_model.qconfig" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "num_calibration_batches = 200 # 仅仅取 200 个批次\n", "prepare(per_channel_quantized_model, inplace=True)\n", "evaluate(per_channel_quantized_model, criterion,\n", " train_iter, num_calibration_batches)\n", "\n", "model_type = 'PTQ 模型(直方图观测器)'\n", "convert(per_channel_quantized_model, inplace=True)\n", "top1, top5 = evaluate(per_channel_quantized_model, criterion, test_iter)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "PTQ 模型(直方图观测器):\n", "\t在 10000 张图片上评估 accuracy 为: 75.68\n" ] } ], "source": [ "print(f'\\n{model_type}:\\n\\t在 {num_eval} 张图片上评估 accuracy 为: {top1.avg:2.2f}')\n", "jit.save(jit.script(per_channel_quantized_model),\n", " saved_model_dir + scripted_quantized_model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "仅仅改变这种量化配置方法,就可以将准确度提高到 $75.6\\%$ 以上!尽管如此,这还是比 $95.09\\%$ 的基线水平低了 $19\\%$。让我们尝试量化感知训练。\n", "\n", "## 量化感知训练\n", "\n", "量化感知训练(Quantization-aware training,QAT)是一种量化方法,通常可以获得最高的精度。使用 QAT,所有的权值和激活都在前向和后向训练过程中被“伪量化”:也就是说,浮点值被舍入以模拟 int8 值,但所有的计算仍然使用浮点数完成。因此,训练过程中的所有权重调整都是在“感知到”模型最终将被量化的情况下进行的;因此,在量化之后,这种方法通常比动态量化或训练后的静态量化产生更高的精度。\n", "\n", "实际执行 QAT 的总体工作流程与之前非常相似:\n", "\n", "- 可以使用与以前相同的模型:不需要为量化感知训练做额外的准备。\n", "- 需要使用 `qconfig` 来指定在权重和激活之后插入何种类型的伪量化,而不是指定观测者。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_qat_model(num_classes,\n", " model_path,\n", " quantize=False,\n", " backend='fbgemm'):\n", " qat_model = create_model(quantize=quantize,\n", " num_classes=num_classes)\n", " qat_model = load_model(qat_model, model_path)\n", " qat_model.fuse_model()\n", " qat_model.qconfig = get_default_qat_qconfig(backend=backend)\n", " return qat_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "最后,`prepare_qat` 执行“伪量化”,为量化感知训练准备模型:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "qat_model = create_qat_model(num_classes)\n", "qat_model = prepare_qat(qat_model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inverted Residual Block:准备好 QAT 后,注意伪量化模块:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): ConvNormActivation(\n", " (0): ConvBnReLU2d(\n", " 32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False\n", " (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (weight_fake_quant): FusedMovingAvgObsFakeQuantize(\n", " fake_quant_enabled=tensor([1]), observer_enabled=tensor([1]), scale=tensor([1.]), zero_point=tensor([0], dtype=torch.int32), dtype=torch.qint8, quant_min=-128, quant_max=127, qscheme=torch.per_channel_symmetric, reduce_range=False\n", " (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))\n", " )\n", " (activation_post_process): FusedMovingAvgObsFakeQuantize(\n", " fake_quant_enabled=tensor([1]), observer_enabled=tensor([1]), scale=tensor([1.]), zero_point=tensor([0], dtype=torch.int32), dtype=torch.quint8, quant_min=0, quant_max=127, qscheme=torch.per_tensor_affine, reduce_range=True\n", " (activation_post_process): MovingAverageMinMaxObserver(min_val=inf, max_val=-inf)\n", " )\n", " )\n", " (1): Identity()\n", " (2): Identity()\n", " )\n", " (1): ConvBn2d(\n", " 32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False\n", " (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (weight_fake_quant): FusedMovingAvgObsFakeQuantize(\n", " fake_quant_enabled=tensor([1]), observer_enabled=tensor([1]), scale=tensor([1.]), zero_point=tensor([0], dtype=torch.int32), dtype=torch.qint8, quant_min=-128, quant_max=127, qscheme=torch.per_channel_symmetric, reduce_range=False\n", " (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))\n", " )\n", " (activation_post_process): FusedMovingAvgObsFakeQuantize(\n", " fake_quant_enabled=tensor([1]), observer_enabled=tensor([1]), scale=tensor([1.]), zero_point=tensor([0], dtype=torch.int32), dtype=torch.quint8, quant_min=0, quant_max=127, qscheme=torch.per_tensor_affine, reduce_range=True\n", " (activation_post_process): MovingAverageMinMaxObserver(min_val=inf, max_val=-inf)\n", " )\n", " )\n", " (2): Identity()\n", ")" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qat_model.features[1].conv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "训练具有高精确度的量化模型要求在推理时对数值进行精确的建模。因此,对于量化感知训练,我们对训练循环进行如下修改:\n", "\n", "- 将批处理范数转换为训练结束时的运行均值和方差,以更好地匹配推理数值。\n", "- 冻结量化器参数(尺度和零点)并微调权重。" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.8+loss 0.011, train acc 0.996, test acc 0.947\n", "180.9 examples/sec on cuda:2\n" ] }, { "data": { "image/svg+xml": "\n\n\n \n \n \n \n 2022-03-24T18:35:16.393509\n image/svg+xml\n \n \n Matplotlib v3.4.0, https://matplotlib.org/\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "CV.train_fine_tuning(qat_model, train_iter, test_iter,\n", " learning_rate=learning_rate,\n", " num_epochs=30,\n", " device='cuda:2',\n", " param_group=True,\n", " is_freeze=False,\n", " is_quantized_acc=False,\n", " need_prepare=False,\n", " ylim=[0.8, 1])" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "convert(qat_model.cpu().eval(), inplace=True)\n", "qat_model.eval();" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "模型大小:2.656573 MB\n", "\n", "QAT 模型:\n", "\t在 10000 张图片上评估 accuracy 为: 94.86000\n" ] } ], "source": [ "print_info(qat_model,'QAT 模型')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "量化感知训练在整个 imagenet 数据集上的准确率超过 $94.86\\%$,接近浮点精度 $95.09\\%$。\n", "\n", "更多关于 QAT 的内容:\n", "\n", "- QAT 是后训练量化技术的超集,允许更多的调试。例如,我们可以分析模型的准确性是否受到权重或激活量化的限制。\n", "- 也可以在浮点上模拟量化模型的准确性,因为使用伪量化来模拟实际量化算法的数值。\n", "- 也可以很容易地模拟训练后量化。\n", "\n", "保存 QAT 模型:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "jit.save(jit.script(qat_model), saved_model_dir + scripted_qat_model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 量化加速(待更,当前有问题)\n", "\n", "最后,确认上面提到的一些事情:量化模型实际上执行推断更快吗?" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Elapsed time: 15 ms\n", "Elapsed time: 219 ms\n" ] }, { "data": { "text/plain": [ "17.546305894851685" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import time\n", "\n", "def run_benchmark(model_file, img_loader):\n", " elapsed = 0\n", " model = torch.jit.load(model_file)\n", " model.eval()\n", " num_batches = 5\n", " # Run the scripted model on a few batches of images\n", " for i, (images, target) in enumerate(img_loader):\n", " if i < num_batches:\n", " start = time.time()\n", " output = model(images)\n", " end = time.time()\n", " elapsed = elapsed + (end-start)\n", " else:\n", " break\n", " num_images = images.size()[0] * num_batches\n", "\n", " print('Elapsed time: %3.0f ms' % (elapsed/num_images*1000))\n", " return elapsed\n", "\n", "run_benchmark(saved_model_dir + scripted_float_model_file, test_iter)\n", "run_benchmark(saved_model_dir + scripted_qat_model_file, test_iter)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在本地运行,常规模型的速度为 15 毫秒,量化模型的速度仅为 20 毫秒,这说明了量化模型与浮点模型相比,典型的 2-4 倍的加速。" ] } ], "metadata": { "interpreter": { "hash": "78526419bf48930935ba7e23437b2460cb231485716b036ebb8701887a294fa8" }, "kernelspec": { "display_name": "Python 3.10.0 ('torchx')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }