量化张量#

创建量化张量#

通过量化非量化的浮点张量得到量化张量

import torch

float_tensor = torch.randn(2, 2, 3)

scale, zero_point = 1e-4, 2
dtype = torch.qint32
q_per_tensor = torch.quantize_per_tensor(float_tensor, scale, zero_point, dtype)
q_per_tensor

tensor([[[ 0.1231, -1.9974,  0.2806],
         [ 0.6392, -0.2118,  0.6879]],

        [[-0.1315, -0.4067,  0.5414],
         [-0.4595, -1.7321, -0.5273]]], size=(2, 2, 3), dtype=torch.qint32,
       quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)

还支持逐通道量化：

scales = torch.tensor([1e-1, 1e-2, 1e-3])
zero_points = torch.tensor([-1, 0, 1])
channel_axis = 2
q_per_channel = torch.quantize_per_channel(float_tensor,
                                           scales,
                                           zero_points,
                                           axis=channel_axis,
                                           dtype=dtype)
q_per_channel

tensor([[[ 0.1000, -2.0000,  0.2810],
         [ 0.6000, -0.2100,  0.6880]],

        [[-0.1000, -0.4100,  0.5410],
         [-0.5000, -1.7300, -0.5270]]], size=(2, 2, 3), dtype=torch.qint32,
       quantization_scheme=torch.per_channel_affine,
       scale=tensor([0.1000, 0.0100, 0.0010], dtype=torch.float64),
       zero_point=tensor([-1,  0,  1]), axis=2)

直接从 empty_quantized 函数创建量化张量

注意，_empty_affine_quantized 是一个私有 API，我们将用类似 torch 的方式替换它。将来使用 empty_quantized_tensor(sizes, quantizer)：

q = torch._empty_affine_quantized([10],
                                  scale=scale,
                                  zero_point=zero_point,
                                  dtype=dtype)
q

tensor([-0.0002, -0.0002, -0.0002, -0.0002,  0.0062, -0.0002,  0.0078, -0.0002,
        -0.0002, -0.0002], size=(10,), dtype=torch.qint32,
       quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)

通过集合 int 张量和量化参数来创建量化张量

备注

注意，_per_tensor_affine_qtensor 是私有 API，我们将用类似 torch 的东西 torch.form_tensor(int_tensor, quantizer) 替换它

int_tensor = torch.randint(0, 100, size=(10,), dtype=torch.uint8)

数据类型为 torch.quint8，即对应的 torch.uint8，我们有以下对应的 torch int 类型和 torch 量化 int 类型：

torch.uint8 -> torch.quint8
torch.int8 -> torch.qint8
torch.int32 -> torch.qint32

q = torch._make_per_tensor_quantized_tensor(int_tensor, scale, zero_point)  # Note no `dtype`
q 

tensor([ 6.4000e-03,  9.3000e-03,  3.7000e-03,  2.3000e-03, -1.0000e-04,
         6.9000e-03,  9.2000e-03,  4.1000e-03,  1.1000e-03,  4.6000e-03],
       size=(10,), dtype=torch.quint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)

在当前的 API 中，我们必须专一每个量化方案的函数，例如，如果我们想量化张量，我们将有 quantize_per_tensor 和 quantize_per_channel。类似地，对于 q_scale 和 q_zero_point，我们应该有以 Quantizer 作为参数的单一量化函数。为了检查量化参数，我们应该让量化张量返回 Quantizer 对象，这样我们就可以在 Quantizer 对象上检查量化参数，而不是把所有东西都放到张量 API 中。当前的基础设施还没有为这种支持做好准备，目前正在开发中。

量化张量的运算#

反量化

dequantized_tensor = q.dequantize()
dequantized_tensor

tensor([ 6.4000e-03,  9.3000e-03,  3.7000e-03,  2.3000e-03, -1.0000e-04,
         6.9000e-03,  9.2000e-03,  4.1000e-03,  1.1000e-03,  4.6000e-03])

支持切片

量化张量像通常的张量一样支持切片：

s = q[2]
s

tensor(0.0037, size=(), dtype=torch.quint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)

备注

尺度（scale）和零点（zero_point）相同的量化张量，它包含与 q_made_per_tensor[2, :] 相同的原始量化张量的第二行值。

赋值

q[0] = 3.5 # 量化 3.5 并将 int 值存储在量化张量中

拷贝

我们可以从量化张量复制相同大小和 dtype 但不同尺度和零点的张量：

scale1, zero_point1 = 1e-1, 0
scale2, zero_point2 = 1, -1
q1 = torch._empty_affine_quantized([2, 3],
                                   scale=scale1,
                                   zero_point=zero_point1,
                                   dtype=torch.qint8)
q2 = torch._empty_affine_quantized([2, 3],
                                   scale=scale2,
                                   zero_point=zero_point2,
                                   dtype=torch.qint8)
q2.copy_(q1)

tensor([[-1.6000,  4.0000,  7.0000],
        [ 3.3000,  6.6000,  8.6000]], size=(2, 3), dtype=torch.qint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.1, zero_point=0)

Permutation

q1.transpose(0, 1)  # see https://pytorch.org/docs/stable/torch.html#torch.transpose
q1.permute([1, 0])  # https://pytorch.org/docs/stable/tensors.html#torch.Tensor.permute
q1.contiguous()  # Convert to contiguous Tensor

tensor([[-1.6000,  4.0000,  7.0000],
        [ 3.3000,  6.6000,  8.6000]], size=(2, 3), dtype=torch.qint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.1, zero_point=0)

序列化与反序列化

import tempfile
with tempfile.NamedTemporaryFile() as f:
    torch.save(q2, f)
    f.seek(0)
    q3 = torch.load(f)

检查量化张量#

# Check size of Tensor
q.numel(), q.size()

(10, torch.Size([10]))

# Check whether the tensor is quantized
q.is_quantized

True

# Get the scale of the quantized Tensor, only works for affine quantized tensor
q.q_scale()

0.0001

# Get the zero_point of quantized Tensor
q.q_zero_point()

# get the underlying integer representation of the quantized Tensor
# int_repr() returns a Tensor of the corresponding data type of the quantized data type
# e.g.for quint8 Tensor it returns a uint8 Tensor while preserving the MemoryFormat when possible
q.int_repr()

tensor([66, 95, 39, 25,  1, 71, 94, 43, 13, 48], dtype=torch.uint8)

# If a quantized Tensor is a scalar we can print the value:
# item() will dequantize the current tensor and return a Scalar of float
q[0].item()

# printing
print(q)

tensor([ 6.4000e-03,  9.3000e-03,  3.7000e-03,  2.3000e-03, -1.0000e-04,
         6.9000e-03,  9.2000e-03,  4.1000e-03,  1.1000e-03,  4.6000e-03],
       size=(10,), dtype=torch.quint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)

# indexing
print(q[0]) # q[0] is a quantized Tensor with one value

tensor(0.0064, size=(), dtype=torch.quint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.0001, zero_point=2)

量化的算子/内核#

我们也在研究量化算子，如量化 QRelu、QAdd、QCat、QLinear、QConv 等。我们要么使用简单的操作符实现，要么在操作符中封装 fbgemm 实现。所有的操作员都是在 C10 中注册的，而且他们现在只在 CPU 中。我们也有关于如何写量化算子/内核的说明。

量化模型#

我们还有量化的模块，它们封装了这些内核实现，这些内核实现位于 torch.nn.quantized 命名空间中，将在模型开发中使用。我们将提供实用函数来将 torch.nn.Module 替换为 torch.nn.quantized.Module，但用户也可以自由地直接使用它们。我们会尽量将量化模块的 api 与 torch.nn.Module 中的对应 api 匹配。

torch.nn.qat

<module 'torch.nn.qat' from '/home/pc/xinet/anaconda3/envs/torchx/lib/python3.10/site-packages/torch/nn/qat/__init__.py'>

量化张量

导航

量化张量#

创建量化张量#

量化张量的运算#

检查量化张量#

量化的算子/内核#

量化模型#