torch.ao.quantization.observer.MinMaxObserver#
- class torch.ao.quantization.observer.MinMaxObserver(dtype=torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False, quant_min=None, quant_max=None, factory_kwargs=None, memoryless=False)[源代码]#
Observer module for computing the quantization parameters based on the running min and max values.
This observer uses the tensor min/max statistics to compute the quantization parameters. The module records the running minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters.
- 参数
dtype – Quantized data type
qscheme – Quantization scheme to be used
reduce_range – Reduces the range of the quantized data type by 1 bit
quant_min – Minimum quantization value. If unspecified, it will follow the 8-bit setup.
quant_max – Maximum quantization value. If unspecified, it will follow the 8-bit setup.
memoryless – Boolean that controls whether observer removes old data when a new input is seen. This is most useful for simulating dynamic quantization, especially during QAT.
Given running min/max as
and , scale and zero point are computed as:The running minimum/maximum
is computed as:where
is the observed tensor.The scale
and zero point are then computed as:where
and are the minimum and maximum of the quantized data type.警告
dtype
can only taketorch.qint8
ortorch.quint8
.备注
If the running minimum equals to the running maximum, the scale and zero_point are set to 1.0 and 0.