{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 量化张量\n", "\n", "参考:[量化张量](https://github.com/pytorch/pytorch/wiki/Introducing-Quantized-Tensor)\n", "\n", "## 创建量化张量\n", "\n", "- 通过量化非量化的浮点张量得到量化张量" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[[ 0.1231, -1.9974, 0.2806],\n", " [ 0.6392, -0.2118, 0.6879]],\n", "\n", " [[-0.1315, -0.4067, 0.5414],\n", " [-0.4595, -1.7321, -0.5273]]], size=(2, 2, 3), dtype=torch.qint32,\n", " quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import torch\n", "\n", "float_tensor = torch.randn(2, 2, 3)\n", "\n", "scale, zero_point = 1e-4, 2\n", "dtype = torch.qint32\n", "q_per_tensor = torch.quantize_per_tensor(float_tensor, scale, zero_point, dtype)\n", "q_per_tensor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "还支持逐通道量化:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[[ 0.1000, -2.0000, 0.2810],\n", " [ 0.6000, -0.2100, 0.6880]],\n", "\n", " [[-0.1000, -0.4100, 0.5410],\n", " [-0.5000, -1.7300, -0.5270]]], size=(2, 2, 3), dtype=torch.qint32,\n", " quantization_scheme=torch.per_channel_affine,\n", " scale=tensor([0.1000, 0.0100, 0.0010], dtype=torch.float64),\n", " zero_point=tensor([-1, 0, 1]), axis=2)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scales = torch.tensor([1e-1, 1e-2, 1e-3])\n", "zero_points = torch.tensor([-1, 0, 1])\n", "channel_axis = 2\n", "q_per_channel = torch.quantize_per_channel(float_tensor,\n", " scales,\n", " zero_points,\n", " axis=channel_axis,\n", " dtype=dtype)\n", "q_per_channel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- 直接从 `empty_quantized` 函数创建量化张量\n", "\n", "注意,`_empty_affine_quantized` 是一个私有 API,我们将用类似 torch 的方式替换它。将来使用 `empty_quantized_tensor(sizes, quantizer)`:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([-0.0002, -0.0002, -0.0002, -0.0002, 0.0062, -0.0002, 0.0078, -0.0002,\n", " -0.0002, -0.0002], size=(10,), dtype=torch.qint32,\n", " quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q = torch._empty_affine_quantized([10],\n", " scale=scale,\n", " zero_point=zero_point,\n", " dtype=dtype)\n", "q" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- 通过集合 int 张量和量化参数来创建量化张量\n", "\n", "```{note}\n", "注意,`_per_tensor_affine_qtensor` 是私有 API,我们将用类似 torch 的东西 `torch.form_tensor(int_tensor, quantizer)` 替换它\n", "```" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "int_tensor = torch.randint(0, 100, size=(10,), dtype=torch.uint8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "数据类型为 `torch.quint8`,即对应的 `torch.uint8`,我们有以下对应的 torch int 类型和 torch 量化 int 类型:\n", "\n", "- `torch.uint8` -> `torch.quint8`\n", "- `torch.int8` -> `torch.qint8`\n", "- `torch.int32` -> `torch.qint32`" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([ 6.4000e-03, 9.3000e-03, 3.7000e-03, 2.3000e-03, -1.0000e-04,\n", " 6.9000e-03, 9.2000e-03, 4.1000e-03, 1.1000e-03, 4.6000e-03],\n", " size=(10,), dtype=torch.quint8,\n", " quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q = torch._make_per_tensor_quantized_tensor(int_tensor, scale, zero_point) # Note no `dtype`\n", "q " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在当前的 API 中,我们必须专一每个量化方案的函数,例如,如果我们想量化张量,我们将有 `quantize_per_tensor` 和 `quantize_per_channel`。类似地,对于 `q_scale` 和 `q_zero_point`,我们应该有以 `Quantizer` 作为参数的单一量化函数。为了检查量化参数,我们应该让量化张量返回 `Quantizer` 对象,这样我们就可以在 `Quantizer` 对象上检查量化参数,而不是把所有东西都放到张量 API 中。当前的基础设施还没有为这种支持做好准备,目前正在开发中。\n", "\n", "## 量化张量的运算\n", "\n", "```{rubric} 反量化\n", "```" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([ 6.4000e-03, 9.3000e-03, 3.7000e-03, 2.3000e-03, -1.0000e-04,\n", " 6.9000e-03, 9.2000e-03, 4.1000e-03, 1.1000e-03, 4.6000e-03])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dequantized_tensor = q.dequantize()\n", "dequantized_tensor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{rubric} 支持切片\n", "```\n", "\n", "量化张量像通常的张量一样支持切片:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor(0.0037, size=(), dtype=torch.quint8,\n", " quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = q[2]\n", "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{note}\n", "尺度(scale)和零点(zero_point)相同的量化张量,它包含与 `q_made_per_tensor[2, :]` 相同的原始量化张量的第二行值。\n", "```\n", "\n", "```{rubric} 赋值\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "q[0] = 3.5 # 量化 3.5 并将 int 值存储在量化张量中" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{rubric} 拷贝\n", "```\n", "\n", "我们可以从量化张量复制相同大小和 dtype 但不同尺度和零点的张量:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[-1.6000, 4.0000, 7.0000],\n", " [ 3.3000, 6.6000, 8.6000]], size=(2, 3), dtype=torch.qint8,\n", " quantization_scheme=torch.per_tensor_affine, scale=0.1, zero_point=0)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scale1, zero_point1 = 1e-1, 0\n", "scale2, zero_point2 = 1, -1\n", "q1 = torch._empty_affine_quantized([2, 3],\n", " scale=scale1,\n", " zero_point=zero_point1,\n", " dtype=torch.qint8)\n", "q2 = torch._empty_affine_quantized([2, 3],\n", " scale=scale2,\n", " zero_point=zero_point2,\n", " dtype=torch.qint8)\n", "q2.copy_(q1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{rubric} Permutation\n", "```" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[-1.6000, 4.0000, 7.0000],\n", " [ 3.3000, 6.6000, 8.6000]], size=(2, 3), dtype=torch.qint8,\n", " quantization_scheme=torch.per_tensor_affine, scale=0.1, zero_point=0)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q1.transpose(0, 1) # see https://pytorch.org/docs/stable/torch.html#torch.transpose\n", "q1.permute([1, 0]) # https://pytorch.org/docs/stable/tensors.html#torch.Tensor.permute\n", "q1.contiguous() # Convert to contiguous Tensor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{rubric} 序列化与反序列化\n", "```" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "import tempfile\n", "with tempfile.NamedTemporaryFile() as f:\n", " torch.save(q2, f)\n", " f.seek(0)\n", " q3 = torch.load(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 检查量化张量" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10, torch.Size([10]))" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check size of Tensor\n", "q.numel(), q.size()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check whether the tensor is quantized\n", "q.is_quantized" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0001" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the scale of the quantized Tensor, only works for affine quantized tensor\n", "q.q_scale()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the zero_point of quantized Tensor\n", "q.q_zero_point()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([66, 95, 39, 25, 1, 71, 94, 43, 13, 48], dtype=torch.uint8)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get the underlying integer representation of the quantized Tensor\n", "# int_repr() returns a Tensor of the corresponding data type of the quantized data type\n", "# e.g.for quint8 Tensor it returns a uint8 Tensor while preserving the MemoryFormat when possible\n", "q.int_repr()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# If a quantized Tensor is a scalar we can print the value:\n", "# item() will dequantize the current tensor and return a Scalar of float\n", "q[0].item()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([ 6.4000e-03, 9.3000e-03, 3.7000e-03, 2.3000e-03, -1.0000e-04,\n", " 6.9000e-03, 9.2000e-03, 4.1000e-03, 1.1000e-03, 4.6000e-03],\n", " size=(10,), dtype=torch.quint8,\n", " quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)\n" ] } ], "source": [ "# printing\n", "print(q)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(0.0064, size=(), dtype=torch.quint8,\n", " quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)\n" ] } ], "source": [ "# indexing\n", "print(q[0]) # q[0] is a quantized Tensor with one value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 量化的算子/内核\n", "\n", "我们也在研究量化算子,如量化 `QRelu`、`QAdd`、`QCat`、`QLinear`、`QConv` 等。我们要么使用简单的操作符实现,要么在操作符中封装 fbgemm 实现。所有的操作员都是在 C10 中注册的,而且他们现在只在 CPU 中。我们也有关于[如何写量化算子/内核的说明](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/README.md)。\n", "\n", "## 量化模型\n", "\n", "我们还有量化的模块,它们封装了这些内核实现,这些内核实现位于 `torch.nn.quantized` 命名空间中,将在模型开发中使用。我们将提供实用函数来将 `torch.nn.Module` 替换为 `torch.nn.quantized.Module`,但用户也可以自由地直接使用它们。我们会尽量将量化模块的 api 与 `torch.nn.Module` 中的对应 api 匹配。" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "torch.nn.qat" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "interpreter": { "hash": "7a45eadec1f9f49b0fdfd1bc7d360ac982412448ce738fa321afc640e3212175" }, "kernelspec": { "display_name": "Python 3.10.4 ('torchx')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }