torch.ao.quantization.observer.HistogramObserver#
- class torch.ao.quantization.observer.HistogramObserver(bins: int = 2048, upsample_rate: int = 128, dtype: torch.dtype = torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False, quant_min=None, quant_max=None, factory_kwargs=None)[源代码]#
The module records the running histogram of tensor values along with min/max values.
calculate_qparams
will calculate scale and zero_point.- 参数
bins – Number of bins to use for the histogram
upsample_rate – Factor by which the histograms are upsampled, this is used to interpolate histograms with varying ranges across observations
dtype – Quantized data type
qscheme – Quantization scheme to be used
reduce_range – Reduces the range of the quantized data type by 1 bit
The scale and zero point are computed as follows:
- Create the histogram of the incoming inputs.
The histogram is computed continuously, and the ranges per bin change with every new tensor observed.
- Search the distribution in the histogram for optimal min/max values.
The search for the min/max values ensures the minimization of the quantization error with respect to the floating point model.
- Compute the scale and zero point the same way as in the
MinMaxObserver