torch.ao.quantization.observer.PerChannelMinMaxObserver#
- class torch.ao.quantization.observer.PerChannelMinMaxObserver(ch_axis=0, dtype=torch.quint8, qscheme=torch.per_channel_affine, reduce_range=False, quant_min=None, quant_max=None, factory_kwargs=None, memoryless=False)[源代码]#
Observer module for computing the quantization parameters based on the running per channel min and max values.
This observer uses the tensor min/max statistics to compute the per channel quantization parameters. The module records the running minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters.
- 参数
ch_axis – Channel axis
dtype – Quantized data type
qscheme – Quantization scheme to be used
reduce_range – Reduces the range of the quantized data type by 1 bit
quant_min – Minimum quantization value. If unspecified, it will follow the 8-bit setup.
quant_max – Maximum quantization value. If unspecified, it will follow the 8-bit setup.
memoryless – Boolean that controls whether observer removes old data when a new input is seen. This is most useful for simulating dynamic quantization, especially during QAT.
The quantization parameters are computed the same way as in
MinMaxObserver
, with the difference that the running min/max values are stored per channel. Scales and zero points are thus computed per channel as well.备注
If the running minimum equals to the running maximum, the scales and zero_points are set to 1.0 and 0.