{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 通用量化模型\n", "\n", "```{rubric} 模型对比\n", "```\n", "\n", "类型|大小（MB）|accuracy（$\\%$）\n", ":-|:-|:-\n", "浮点|9.188|95.09\n", "浮点融合|8.924|95.09\n", "QAT|2.657|94.86\n", "\n", "```{rubric} 不同 QConfig 的静态 PTQ 模型\n", "```\n", "\n", "accuracy（$\\%$）|激活|权重|\n", ":-|:-|:-\n", "|40.51|{data}`~torch.ao.quantization.observer.MinMaxObserver`.`with_args(quant_min=0, quant_max=127)`|{data}`~torch.ao.quantization.observer.MinMaxObserver`.`with_args(dtype=torch.qint8, qscheme=torch.per_tensor_symmetric)`\n", "75.68|{data}`~torch.ao.quantization.observer.HistogramObserver`.`with_args(quant_min=0, quant_max=127)`|{data}`~torch.ao.quantization.observer.PerChannelMinMaxObserver`.`with_args(dtype=torch.qint8, qscheme=torch.per_channel_symmetric)`\n", "\n", "加载一些库：" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from torch import nn, jit\n", "from torch.ao.quantization.qconfig import default_qconfig\n", "from torch.ao.quantization.qconfig import get_default_qat_qconfig, get_default_qconfig\n", "from torch.ao.quantization.quantize import prepare, convert, prepare_qat\n", "import torch\n", "\n", "# 设置 warnings\n", "import warnings\n", "warnings.filterwarnings(\n", " action='ignore',\n", " category=DeprecationWarning,\n", " module='.*'\n", ")\n", "warnings.filterwarnings(\n", " action='ignore',\n", " module='torch.ao.quantization'\n", ")\n", "# 载入自定义模块\n", "from mod import load_mod\n", "load_mod()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "设置模型保存路径和超参数：" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "saved_model_dir = 'models/'\n", "float_model_file = 'mobilenet_pretrained_float.pth'\n", "scripted_float_model_file = 'mobilenet_float_scripted.pth'\n", "scripted_ptq_model_file = 'mobilenet_ptq_scripted.pth'\n", "scripted_quantized_model_file = 'mobilenet_quantization_scripted_quantized.pth'\n", "scripted_qat_model_file = 'mobilenet_qat_scripted_quantized.pth'\n", "# 超参数\n", "learning_rate = 5e-5\n", "num_epochs = 30\n", "batch_size = 16\n", "num_classes = 10\n", "# train_batch_size = 30\n", "# eval_batch_size = 50\n", "\n", "# 设置评估策略\n", "criterion = nn.CrossEntropyLoss()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 辅助函数\n", "\n", "接下来，我们定义几个[帮助函数](https://github.com/pytorch/examples/blob/master/imagenet/main.py)来帮助评估模型。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from helper import evaluate, print_size_of_model, load_model\n", "\n", "\n", "def print_info(model, model_type='浮点模型'):\n", " '''打印信息'''\n", " print_size_of_model(model)\n", " top1, top5 = evaluate(model, criterion, test_iter)\n", " print(f'\\n{model_type}：\\n\\t'\n", " f'在 {num_eval} 张图片上评估 accuracy 为: {top1.avg:2.5f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 定义数据集和数据加载器\n", "\n", "作为最后一个主要的设置步骤，我们为训练和测试集定义了数据加载器。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from xinet import CV\n", "\n", "# 为了 cifar10 匹配 ImageNet，需要将其 resize 到 224\n", "train_iter, test_iter = CV.load_data_cifar10(batch_size=batch_size,\n", " resize=224)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "查看数据集的 batch 次数：" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "训练、测试批次分别为： 3125 625\n" ] } ], "source": [ "print('训练、测试批次分别为：',\n", " len(train_iter), len(test_iter))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "获取训练和测试数据集的大小：" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(50000, 10000)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "num_train = sum(len(ys) for _, ys in train_iter)\n", "num_eval = sum(len(ys) for _, ys in test_iter)\n", "num_train, num_eval" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 微调浮点模型\n", "\n", "配置浮点模型：" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "from torchvision.models.quantization import mobilenet_v2\n", "\n", "# 定义模型\n", "def create_model(quantize=False,\n", " num_classes=10,\n", " pretrained=False):\n", " float_model = mobilenet_v2(pretrained=pretrained,\n", " quantize=quantize)\n", " # 匹配 ``num_classes``\n", " float_model.classifier[1] = nn.Linear(float_model.last_channel,\n", " num_classes)\n", " return float_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "定义模型：" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "float_model = create_model(pretrained=True,\n", " quantize=False,\n", " num_classes=num_classes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "微调浮点模型：" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loss 0.012, train acc 0.996, test acc 0.951\n", "338.8 examples/sec on cuda:0\n" ] }, { "data": { "image/svg+xml": "\n\n\n", "text/plain": [ "

" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "CV.train_fine_tuning(float_model, train_iter, test_iter,\n", " learning_rate=learning_rate,\n", " num_epochs=num_epochs,\n", " device='cuda:0',\n", " param_group=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "保存模型：" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "torch.save(float_model.state_dict(), saved_model_dir + float_model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 配置可量化模型\n", "\n", "加载浮点模型：" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "float_model = create_model(quantize=False,\n", " num_classes=num_classes)\n", "float_model = load_model(float_model, saved_model_dir + float_model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "查看浮点模型的信息：" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "模型大小：9.187789 MB\n", "\n", "浮点模型：\n", "\t在 10000 张图片上评估 accuracy 为: 95.09000\n" ] } ], "source": [ "print_info(float_model, model_type='浮点模型')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{rubric} 融合模块\n", "```\n", "\n", "融合模块既可以节省内存访问，使模型更快，同时也提高了数值精度。虽然这可以用于任何模型，但这在量化模型中尤其常见。\n", "\n", "可以先查看融合前的 inverted residual 块：" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): ConvNormActivation(\n", " (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)\n", " (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (2): ReLU()\n", " )\n", " (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", ")" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "float_model.features[1].conv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "启用评估模式：" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "float_model.eval();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "融合模块：" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "float_model.fuse_model()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "查看融合后的 inverted residual 块：" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): ConvNormActivation(\n", " (0): ConvReLU2d(\n", " (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)\n", " (1): ReLU()\n", " )\n", " (1): Identity()\n", " (2): Identity()\n", " )\n", " (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1))\n", " (2): Identity()\n", ")" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "float_model.features[1].conv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "最后，为了得到“基线”精度，让我们看看融合模块的非量化模型的精度：" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "baseline 模型大小\n", "模型大小：8.923757 MB\n" ] } ], "source": [ "model_type = '融合后的浮点模型'\n", "print(\"baseline 模型大小\")\n", "print_size_of_model(float_model)\n", "\n", "top1, top5 = evaluate(float_model, criterion, test_iter)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "融合后的浮点模型：\n", "\t在 10000 张图片上评估 accuracy 为: 95.090\n" ] } ], "source": [ "print(f'\\n{model_type}：\\n\\t在 {num_eval} 张图片上评估 accuracy 为: {top1.avg:2.3f}')\n", "# 保存\n", "jit.save(jit.script(float_model), saved_model_dir + scripted_float_model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这将是我们进行比较的基准。接下来，让我们尝试不同的量化方法。\n", "\n", "## 静态量化后训练\n", "\n", "训练后的静态量化（Post Training Quantization，简称 PTQ）不仅包括将权重从 float 转换为int，就像在动态量化中一样，还包括执行额外的步骤，即首先通过网络输入一批数据，并计算不同激活的结果分布（具体来说，这是通过在记录数据的不同点插入观测者模块来实现的）。然后使用这些分布来确定如何在推断时量化不同的激活（一种简单的技术是将整个激活范围划分为 256 个级别，但我们也支持更复杂的方法）。重要的是，这个额外的步骤允许我们在运算之间传递量化的值，而不是在每个运算之间将这些值转换为浮点数（然后再转换为整数），从而显著提高了速度。" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# 加载模型\n", "myModel = create_model(pretrained=False,\n", " quantize=False,\n", " num_classes=num_classes)\n", "float_model = load_model(myModel,\n", " saved_model_dir + float_model_file)\n", "myModel.eval()\n", "\n", "# 融合\n", "myModel.fuse_model()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "指定量化配置（从简单的最小/最大范围估计和加权的逐张量量化开始）：" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "QConfig(activation=functools.partial(, quant_min=0, quant_max=127){}, weight=functools.partial(, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric){})" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myModel.qconfig = default_qconfig\n", "myModel.qconfig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "开始校准准备：" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PTQ 准备：插入观测者\n", "\n", " 查看观测者插入后的 inverted residual \n", "\n", " Sequential(\n", " (0): ConvNormActivation(\n", " (0): ConvReLU2d(\n", " (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)\n", " (1): ReLU()\n", " (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)\n", " )\n", " (1): Identity()\n", " (2): Identity()\n", " )\n", " (1): Conv2d(\n", " 32, 16, kernel_size=(1, 1), stride=(1, 1)\n", " (activation_post_process): MinMaxObserver(min_val=inf, max_val=-inf)\n", " )\n", " (2): Identity()\n", ")\n" ] } ], "source": [ "print('PTQ 准备：插入观测者')\n", "prepare(myModel, inplace=True)\n", "print('\\n 查看观测者插入后的 inverted residual \\n\\n',\n", " myModel.features[1].conv)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "用数据集校准：" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "PTQ：校准完成！\n" ] } ], "source": [ "num_calibration_batches = -1 # 取全部训练集做校准\n", "evaluate(myModel, criterion, train_iter, neval_batches=num_calibration_batches)\n", "print('\\nPTQ：校准完成！')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "转换为量化模型：" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PTQ：转换完成！\n" ] } ], "source": [ "convert(myModel, inplace=True)\n", "print('PTQ：转换完成！')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "融合并量化后，查看融合模块的 Inverted Residual 块：" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): ConvNormActivation(\n", " (0): QuantizedConvReLU2d(32, 32, kernel_size=(3, 3), stride=(1, 1), scale=0.18101467192173004, zero_point=0, padding=(1, 1), groups=32)\n", " (1): Identity()\n", " (2): Identity()\n", " )\n", " (1): QuantizedConv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), scale=0.2768715023994446, zero_point=81)\n", " (2): Identity()\n", ")" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myModel.features[1].conv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "量化后的模型大小：" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "模型大小：2.356113 MB\n" ] } ], "source": [ "print_size_of_model(myModel)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "评估：" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "PTQ 模型：\n", "\t在 10000 张图片上评估 accuracy 为: 40.51\n" ] } ], "source": [ "model_type = 'PTQ 模型'\n", "top1, top5 = evaluate(myModel, criterion, test_iter)\n", "print(f'\\n{model_type}：\\n\\t在 {num_eval} 张图片上评估 accuracy 为: {top1.avg:2.2f}')\n", "jit.save(jit.script(float_model), saved_model_dir + scripted_ptq_model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这是因为我们使用了简单的 min/max 观测器来确定量化参数。尽管如此，我们还是将模型的大小减少到了 2.36 MB 以下，几乎减少了 4 倍。\n", "\n", "此外，通过使用不同的量化配置来显著提高精度（对于量化 ARM 架构的推荐配置重复同样的练习）。该配置的操作如下：\n", "\n", "- 在 per-channel 基础上量化权重\n", "- 使用直方图观测器，收集激活的直方图，然后以最佳方式选择量化参数。" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "QConfig(activation=functools.partial(, reduce_range=True){}, weight=functools.partial(, dtype=torch.qint8, qscheme=torch.per_channel_symmetric){})" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "per_channel_quantized_model = create_model(quantize=False,\n", " num_classes=num_classes)\n", "per_channel_quantized_model = load_model(per_channel_quantized_model,\n", " saved_model_dir + float_model_file)\n", "per_channel_quantized_model.eval()\n", "per_channel_quantized_model.fuse_model()\n", "per_channel_quantized_model.qconfig = get_default_qconfig('fbgemm')\n", "per_channel_quantized_model.qconfig" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "num_calibration_batches = 200 # 仅仅取 200 个批次\n", "prepare(per_channel_quantized_model, inplace=True)\n", "evaluate(per_channel_quantized_model, criterion,\n", " train_iter, num_calibration_batches)\n", "\n", "model_type = 'PTQ 模型（直方图观测器）'\n", "convert(per_channel_quantized_model, inplace=True)\n", "top1, top5 = evaluate(per_channel_quantized_model, criterion, test_iter)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "PTQ 模型（直方图观测器）：\n", "\t在 10000 张图片上评估 accuracy 为: 75.68\n" ] } ], "source": [ "print(f'\\n{model_type}：\\n\\t在 {num_eval} 张图片上评估 accuracy 为: {top1.avg:2.2f}')\n", "jit.save(jit.script(per_channel_quantized_model),\n", " saved_model_dir + scripted_quantized_model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "仅仅改变这种量化配置方法，就可以将准确度提高到 $75.6\\%$ 以上！尽管如此，这还是比 $95.09\\%$ 的基线水平低了 $19\\%$。让我们尝试量化感知训练。\n", "\n", "## 量化感知训练\n", "\n", "量化感知训练（Quantization-aware training，QAT）是一种量化方法，通常可以获得最高的精度。使用 QAT，所有的权值和激活都在前向和后向训练过程中被“伪量化”：也就是说，浮点值被舍入以模拟 int8 值，但所有的计算仍然使用浮点数完成。因此，训练过程中的所有权重调整都是在“感知到”模型最终将被量化的情况下进行的；因此，在量化之后，这种方法通常比动态量化或训练后的静态量化产生更高的精度。\n", "\n", "实际执行 QAT 的总体工作流程与之前非常相似：\n", "\n", "- 可以使用与以前相同的模型：不需要为量化感知训练做额外的准备。\n", "- 需要使用 `qconfig` 来指定在权重和激活之后插入何种类型的伪量化，而不是指定观测者。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_qat_model(num_classes,\n", " model_path,\n", " quantize=False,\n", " backend='fbgemm'):\n", " qat_model = create_model(quantize=quantize,\n", " num_classes=num_classes)\n", " qat_model = load_model(qat_model, model_path)\n", " qat_model.fuse_model()\n", " qat_model.qconfig = get_default_qat_qconfig(backend=backend)\n", " return qat_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "最后，`prepare_qat` 执行“伪量化”，为量化感知训练准备模型：" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "qat_model = create_qat_model(num_classes)\n", "qat_model = prepare_qat(qat_model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inverted Residual Block：准备好 QAT 后，注意伪量化模块：" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): ConvNormActivation(\n", " (0): ConvBnReLU2d(\n", " 32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False\n", " (bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (weight_fake_quant): FusedMovingAvgObsFakeQuantize(\n", " fake_quant_enabled=tensor([1]), observer_enabled=tensor([1]), scale=tensor([1.]), zero_point=tensor([0], dtype=torch.int32), dtype=torch.qint8, quant_min=-128, quant_max=127, qscheme=torch.per_channel_symmetric, reduce_range=False\n", " (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))\n", " )\n", " (activation_post_process): FusedMovingAvgObsFakeQuantize(\n", " fake_quant_enabled=tensor([1]), observer_enabled=tensor([1]), scale=tensor([1.]), zero_point=tensor([0], dtype=torch.int32), dtype=torch.quint8, quant_min=0, quant_max=127, qscheme=torch.per_tensor_affine, reduce_range=True\n", " (activation_post_process): MovingAverageMinMaxObserver(min_val=inf, max_val=-inf)\n", " )\n", " )\n", " (1): Identity()\n", " (2): Identity()\n", " )\n", " (1): ConvBn2d(\n", " 32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False\n", " (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (weight_fake_quant): FusedMovingAvgObsFakeQuantize(\n", " fake_quant_enabled=tensor([1]), observer_enabled=tensor([1]), scale=tensor([1.]), zero_point=tensor([0], dtype=torch.int32), dtype=torch.qint8, quant_min=-128, quant_max=127, qscheme=torch.per_channel_symmetric, reduce_range=False\n", " (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([]))\n", " )\n", " (activation_post_process): FusedMovingAvgObsFakeQuantize(\n", " fake_quant_enabled=tensor([1]), observer_enabled=tensor([1]), scale=tensor([1.]), zero_point=tensor([0], dtype=torch.int32), dtype=torch.quint8, quant_min=0, quant_max=127, qscheme=torch.per_tensor_affine, reduce_range=True\n", " (activation_post_process): MovingAverageMinMaxObserver(min_val=inf, max_val=-inf)\n", " )\n", " )\n", " (2): Identity()\n", ")" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qat_model.features[1].conv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "训练具有高精确度的量化模型要求在推理时对数值进行精确的建模。因此，对于量化感知训练，我们对训练循环进行如下修改：\n", "\n", "- 将批处理范数转换为训练结束时的运行均值和方差，以更好地匹配推理数值。\n", "- 冻结量化器参数（尺度和零点）并微调权重。" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.8+loss 0.011, train acc 0.996, test acc 0.947\n", "180.9 examples/sec on cuda:2\n" ] }, { "data": { "image/svg+xml": "\n\n\n", "text/plain": [ "

" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "CV.train_fine_tuning(qat_model, train_iter, test_iter,\n", " learning_rate=learning_rate,\n", " num_epochs=30,\n", " device='cuda:2',\n", " param_group=True,\n", " is_freeze=False,\n", " is_quantized_acc=False,\n", " need_prepare=False,\n", " ylim=[0.8, 1])" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "convert(qat_model.cpu().eval(), inplace=True)\n", "qat_model.eval();" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "模型大小：2.656573 MB\n", "\n", "QAT 模型：\n", "\t在 10000 张图片上评估 accuracy 为: 94.86000\n" ] } ], "source": [ "print_info(qat_model,'QAT 模型')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "量化感知训练在整个 imagenet 数据集上的准确率超过 $94.86\\%$，接近浮点精度 $95.09\\%$。\n", "\n", "更多关于 QAT 的内容：\n", "\n", "- QAT 是后训练量化技术的超集，允许更多的调试。例如，我们可以分析模型的准确性是否受到权重或激活量化的限制。\n", "- 也可以在浮点上模拟量化模型的准确性，因为使用伪量化来模拟实际量化算法的数值。\n", "- 也可以很容易地模拟训练后量化。\n", "\n", "保存 QAT 模型：" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "jit.save(jit.script(qat_model), saved_model_dir + scripted_qat_model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 量化加速（待更，当前有问题）\n", "\n", "最后，确认上面提到的一些事情：量化模型实际上执行推断更快吗？" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Elapsed time: 15 ms\n", "Elapsed time: 219 ms\n" ] }, { "data": { "text/plain": [ "17.546305894851685" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import time\n", "\n", "def run_benchmark(model_file, img_loader):\n", " elapsed = 0\n", " model = torch.jit.load(model_file)\n", " model.eval()\n", " num_batches = 5\n", " # Run the scripted model on a few batches of images\n", " for i, (images, target) in enumerate(img_loader):\n", " if i < num_batches:\n", " start = time.time()\n", " output = model(images)\n", " end = time.time()\n", " elapsed = elapsed + (end-start)\n", " else:\n", " break\n", " num_images = images.size()[0] * num_batches\n", "\n", " print('Elapsed time: %3.0f ms' % (elapsed/num_images*1000))\n", " return elapsed\n", "\n", "run_benchmark(saved_model_dir + scripted_float_model_file, test_iter)\n", "run_benchmark(saved_model_dir + scripted_qat_model_file, test_iter)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在本地运行，常规模型的速度为 15 毫秒，量化模型的速度仅为 20 毫秒，这说明了量化模型与浮点模型相比，典型的 2-4 倍的加速。" ] } ], "metadata": { "interpreter": { "hash": "78526419bf48930935ba7e23437b2460cb231485716b036ebb8701887a294fa8" }, "kernelspec": { "display_name": "Python 3.10.0 ('torchx')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }