{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 量化张量\n",
    "\n",
    "参考：[量化张量](https://github.com/pytorch/pytorch/wiki/Introducing-Quantized-Tensor)\n",
    "\n",
    "## 创建量化张量\n",
    "\n",
    "- 通过量化非量化的浮点张量得到量化张量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[[ 0.1231, -1.9974,  0.2806],\n",
       "         [ 0.6392, -0.2118,  0.6879]],\n",
       "\n",
       "        [[-0.1315, -0.4067,  0.5414],\n",
       "         [-0.4595, -1.7321, -0.5273]]], size=(2, 2, 3), dtype=torch.qint32,\n",
       "       quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch\n",
    "\n",
    "float_tensor = torch.randn(2, 2, 3)\n",
    "\n",
    "scale, zero_point = 1e-4, 2\n",
    "dtype = torch.qint32\n",
    "q_per_tensor = torch.quantize_per_tensor(float_tensor, scale, zero_point, dtype)\n",
    "q_per_tensor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "还支持逐通道量化："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[[ 0.1000, -2.0000,  0.2810],\n",
       "         [ 0.6000, -0.2100,  0.6880]],\n",
       "\n",
       "        [[-0.1000, -0.4100,  0.5410],\n",
       "         [-0.5000, -1.7300, -0.5270]]], size=(2, 2, 3), dtype=torch.qint32,\n",
       "       quantization_scheme=torch.per_channel_affine,\n",
       "       scale=tensor([0.1000, 0.0100, 0.0010], dtype=torch.float64),\n",
       "       zero_point=tensor([-1,  0,  1]), axis=2)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "scales = torch.tensor([1e-1, 1e-2, 1e-3])\n",
    "zero_points = torch.tensor([-1, 0, 1])\n",
    "channel_axis = 2\n",
    "q_per_channel = torch.quantize_per_channel(float_tensor,\n",
    "                                           scales,\n",
    "                                           zero_points,\n",
    "                                           axis=channel_axis,\n",
    "                                           dtype=dtype)\n",
    "q_per_channel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 直接从 `empty_quantized` 函数创建量化张量\n",
    "\n",
    "注意，`_empty_affine_quantized` 是一个私有 API，我们将用类似 torch 的方式替换它。将来使用 `empty_quantized_tensor(sizes, quantizer)`："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([-0.0002, -0.0002, -0.0002, -0.0002,  0.0062, -0.0002,  0.0078, -0.0002,\n",
       "        -0.0002, -0.0002], size=(10,), dtype=torch.qint32,\n",
       "       quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q = torch._empty_affine_quantized([10],\n",
    "                                  scale=scale,\n",
    "                                  zero_point=zero_point,\n",
    "                                  dtype=dtype)\n",
    "q"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 通过集合 int 张量和量化参数来创建量化张量\n",
    "\n",
    "```{note}\n",
    "注意，`_per_tensor_affine_qtensor` 是私有 API，我们将用类似 torch 的东西 `torch.form_tensor(int_tensor, quantizer)` 替换它\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "int_tensor = torch.randint(0, 100, size=(10,), dtype=torch.uint8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数据类型为 `torch.quint8`，即对应的 `torch.uint8`，我们有以下对应的 torch int 类型和 torch 量化 int 类型：\n",
    "\n",
    "- `torch.uint8` -> `torch.quint8`\n",
    "- `torch.int8` -> `torch.qint8`\n",
    "- `torch.int32` -> `torch.qint32`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([ 6.4000e-03,  9.3000e-03,  3.7000e-03,  2.3000e-03, -1.0000e-04,\n",
       "         6.9000e-03,  9.2000e-03,  4.1000e-03,  1.1000e-03,  4.6000e-03],\n",
       "       size=(10,), dtype=torch.quint8,\n",
       "       quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q = torch._make_per_tensor_quantized_tensor(int_tensor, scale, zero_point)  # Note no `dtype`\n",
    "q "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在当前的 API 中，我们必须专一每个量化方案的函数，例如，如果我们想量化张量，我们将有 `quantize_per_tensor` 和 `quantize_per_channel`。类似地，对于 `q_scale` 和 `q_zero_point`，我们应该有以 `Quantizer` 作为参数的单一量化函数。为了检查量化参数，我们应该让量化张量返回 `Quantizer` 对象，这样我们就可以在 `Quantizer` 对象上检查量化参数，而不是把所有东西都放到张量 API 中。当前的基础设施还没有为这种支持做好准备，目前正在开发中。\n",
    "\n",
    "## 量化张量的运算\n",
    "\n",
    "```{rubric} 反量化\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([ 6.4000e-03,  9.3000e-03,  3.7000e-03,  2.3000e-03, -1.0000e-04,\n",
       "         6.9000e-03,  9.2000e-03,  4.1000e-03,  1.1000e-03,  4.6000e-03])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dequantized_tensor = q.dequantize()\n",
    "dequantized_tensor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{rubric} 支持切片\n",
    "```\n",
    "\n",
    "量化张量像通常的张量一样支持切片："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(0.0037, size=(), dtype=torch.quint8,\n",
       "       quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = q[2]\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{note}\n",
    "尺度（scale）和零点（zero_point）相同的量化张量，它包含与 `q_made_per_tensor[2, :]` 相同的原始量化张量的第二行值。\n",
    "```\n",
    "\n",
    "```{rubric} 赋值\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "q[0] = 3.5 # 量化 3.5 并将 int 值存储在量化张量中"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{rubric} 拷贝\n",
    "```\n",
    "\n",
    "我们可以从量化张量复制相同大小和 dtype 但不同尺度和零点的张量："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[-1.6000,  4.0000,  7.0000],\n",
       "        [ 3.3000,  6.6000,  8.6000]], size=(2, 3), dtype=torch.qint8,\n",
       "       quantization_scheme=torch.per_tensor_affine, scale=0.1, zero_point=0)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "scale1, zero_point1 = 1e-1, 0\n",
    "scale2, zero_point2 = 1, -1\n",
    "q1 = torch._empty_affine_quantized([2, 3],\n",
    "                                   scale=scale1,\n",
    "                                   zero_point=zero_point1,\n",
    "                                   dtype=torch.qint8)\n",
    "q2 = torch._empty_affine_quantized([2, 3],\n",
    "                                   scale=scale2,\n",
    "                                   zero_point=zero_point2,\n",
    "                                   dtype=torch.qint8)\n",
    "q2.copy_(q1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{rubric} Permutation\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[-1.6000,  4.0000,  7.0000],\n",
       "        [ 3.3000,  6.6000,  8.6000]], size=(2, 3), dtype=torch.qint8,\n",
       "       quantization_scheme=torch.per_tensor_affine, scale=0.1, zero_point=0)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q1.transpose(0, 1)  # see https://pytorch.org/docs/stable/torch.html#torch.transpose\n",
    "q1.permute([1, 0])  # https://pytorch.org/docs/stable/tensors.html#torch.Tensor.permute\n",
    "q1.contiguous()  # Convert to contiguous Tensor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{rubric} 序列化与反序列化\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tempfile\n",
    "with tempfile.NamedTemporaryFile() as f:\n",
    "    torch.save(q2, f)\n",
    "    f.seek(0)\n",
    "    q3 = torch.load(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 检查量化张量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(10, torch.Size([10]))"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Check size of Tensor\n",
    "q.numel(), q.size()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Check whether the tensor is quantized\n",
    "q.is_quantized"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0001"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get the scale of the quantized Tensor, only works for affine quantized tensor\n",
    "q.q_scale()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get the zero_point of quantized Tensor\n",
    "q.q_zero_point()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([66, 95, 39, 25,  1, 71, 94, 43, 13, 48], dtype=torch.uint8)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# get the underlying integer representation of the quantized Tensor\n",
    "# int_repr() returns a Tensor of the corresponding data type of the quantized data type\n",
    "# e.g.for quint8 Tensor it returns a uint8 Tensor while preserving the MemoryFormat when possible\n",
    "q.int_repr()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# If a quantized Tensor is a scalar we can print the value:\n",
    "# item() will dequantize the current tensor and return a Scalar of float\n",
    "q[0].item()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([ 6.4000e-03,  9.3000e-03,  3.7000e-03,  2.3000e-03, -1.0000e-04,\n",
      "         6.9000e-03,  9.2000e-03,  4.1000e-03,  1.1000e-03,  4.6000e-03],\n",
      "       size=(10,), dtype=torch.quint8,\n",
      "       quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)\n"
     ]
    }
   ],
   "source": [
    "# printing\n",
    "print(q)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(0.0064, size=(), dtype=torch.quint8,\n",
      "       quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)\n"
     ]
    }
   ],
   "source": [
    "# indexing\n",
    "print(q[0]) # q[0] is a quantized Tensor with one value"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 量化的算子/内核\n",
    "\n",
    "我们也在研究量化算子，如量化 `QRelu`、`QAdd`、`QCat`、`QLinear`、`QConv` 等。我们要么使用简单的操作符实现，要么在操作符中封装 fbgemm 实现。所有的操作员都是在 C10 中注册的，而且他们现在只在 CPU 中。我们也有关于[如何写量化算子/内核的说明](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/README.md)。\n",
    "\n",
    "## 量化模型\n",
    "\n",
    "我们还有量化的模块，它们封装了这些内核实现，这些内核实现位于 `torch.nn.quantized` 命名空间中，将在模型开发中使用。我们将提供实用函数来将 `torch.nn.Module` 替换为 `torch.nn.quantized.Module`，但用户也可以自由地直接使用它们。我们会尽量将量化模块的 api 与 `torch.nn.Module` 中的对应 api 匹配。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<module 'torch.nn.qat' from '/home/pc/xinet/anaconda3/envs/torchx/lib/python3.10/site-packages/torch/nn/qat/__init__.py'>"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "torch.nn.qat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "7a45eadec1f9f49b0fdfd1bc7d360ac982412448ce738fa321afc640e3212175"
  },
  "kernelspec": {
   "display_name": "Python 3.10.4 ('torchx')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.4"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}