torch.ao.quantization.observer.MovingAverageMinMaxObserver#

class torch.ao.quantization.observer.MovingAverageMinMaxObserver(averaging_constant=0.01, dtype=torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False, quant_min=None, quant_max=None, **kwargs)[源代码]#

Observer module for computing the quantization parameters based on the moving average of the min and max values.

This observer computes the quantization parameters based on the moving averages of minimums and maximums of the incoming tensors. The module records the average minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters.

参数
  • averaging_constant – Averaging constant for min/max.

  • dtype – Quantized data type

  • qscheme – Quantization scheme to be used

  • reduce_range – Reduces the range of the quantized data type by 1 bit

  • quant_min – Minimum quantization value. If unspecified, it will follow the 8-bit setup.

  • quant_max – Maximum quantization value. If unspecified, it will follow the 8-bit setup.

The moving average min/max is computed as follows

\[\begin{split}\begin{array}{ll} x_\text{min} = \begin{cases} \min(X) & \text{if~}x_\text{min} = \text{None} \\ (1 - c) x_\text{min} + c \min(X) & \text{otherwise} \end{cases}\\ x_\text{max} = \begin{cases} \max(X) & \text{if~}x_\text{max} = \text{None} \\ (1 - c) x_\text{max} + c \max(X) & \text{otherwise} \end{cases}\\ \end{array}\end{split}\]

where \(x_\text{min/max}\) is the running average min/max, \(X\) is is the incoming tensor, and \(c\) is the averaging_constant.

The scale and zero point are then computed as in MinMaxObserver.

备注

Only works with torch.per_tensor_affine quantization scheme.

备注

If the running minimum equals to the running maximum, the scale and zero_point are set to 1.0 and 0.