torch.ao.quantization.observer.HistogramObserver#

class torch.ao.quantization.observer.HistogramObserver(bins: int = 2048, upsample_rate: int = 128, dtype: torch.dtype = torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False, quant_min=None, quant_max=None, factory_kwargs=None)[源代码]#

The module records the running histogram of tensor values along with min/max values. calculate_qparams will calculate scale and zero_point.

参数
  • bins – Number of bins to use for the histogram

  • upsample_rate – Factor by which the histograms are upsampled, this is used to interpolate histograms with varying ranges across observations

  • dtype – Quantized data type

  • qscheme – Quantization scheme to be used

  • reduce_range – Reduces the range of the quantized data type by 1 bit

The scale and zero point are computed as follows:

  1. Create the histogram of the incoming inputs.

    The histogram is computed continuously, and the ranges per bin change with every new tensor observed.

  2. Search the distribution in the histogram for optimal min/max values.

    The search for the min/max values ensures the minimization of the quantization error with respect to the floating point model.

  3. Compute the scale and zero point the same way as in the

    MinMaxObserver