MixedPrecisionQuantizationConfig

Class to configure the quantization process of the model when quantizing in mixed-precision:

class model_compression_toolkit.core.MixedPrecisionQuantizationConfig(compute_distance_fn=None, distance_weighting_method=None, num_of_images=32, configuration_overwrite=None, num_interest_points_factor=1.0, use_hessian_based_scores=False, norm_scores=True, refine_mp_solution=True, metric_normalization_threshold=10000000000.0, hessian_batch_size=32, metric_normalization=MpMetricNormalization.NONE, metric_epsilon=1e-06, exp_distance_weighting_sigma=0.1, custom_metric_fn=None)

Class with mixed precision parameters to quantize the input model.

Parameters:
  • compute_distance_fn (Callable) – Function to compute a distance between two tensors. If None, using pre-defined distance methods based on the layer type for each layer.

  • distance_weighting_method (MpDistanceWeighting) – distance weighting method to use. By default, MpDistanceWeighting.AVG.

  • num_of_images (int) – Number of images to use to evaluate the sensitivity of a mixed-precision model comparing to the float model.

  • configuration_overwrite (List[int]) – A list of integers that enables overwrite of mixed precision with a predefined one.

  • num_interest_points_factor (float) – A multiplication factor between zero and one (represents percentage) to reduce the number of interest points used to calculate the distance metric.

  • use_hessian_based_scores (bool) – Whether to use Hessian-based scores for weighted average distance metric computation. This is identical to passing distance_weighting_method=MpDistanceWeighting.HESSIAN.

  • norm_scores (bool) – Whether to normalize the returned scores for the weighted distance metric (to get values between 0 and 1).

  • refine_mp_solution (bool) – Whether to try to improve the final mixed-precision configuration using a greedy algorithm that searches layers to increase their bit-width, or not.

  • metric_normalization_threshold (float) – A threshold for checking the mixed precision distance metric values, In case of values larger than this threshold, the metric will be scaled to prevent numerical issues.

  • hessian_batch_size (int) – The Hessian computation batch size. used only if using mixed precision with Hessian-based objective.

  • metric_normalization (MpMetricNormalization) – Metric normalization method.

  • metric_epsilon (float | None) – ensure minimal distance between the metric for any non-max-bidwidth candidate and a max-bitwidth candidate, i.e. metric(non-max-bitwidth) >= metric(max-bitwidth) + epsilon. If none, the computed metrics are used as is.

  • exp_distance_weighting_sigma (float) – sigma for exponential weighting method. A distance for each interest point is normalized by sigma prior to applying exponent.

  • custom_metric_fn (Callable) – Function to compute a custom metric. As input gets the model_mp and returns a float value for metric. If None, uses interest point metric.

MpDistanceWeighting

class model_compression_toolkit.core.MpDistanceWeighting(value)

Defines interest points distances weighting methods.

AVG - take the average distance over all interest points.

LAST_LAYER - take only the distance of the last interest point.

EXP - weighted average with weights based on exponent of negative distances between activations of the quantized and the float models.

HESSIAN - weighted average with Hessians as weights.

MpMetricNormalization

class model_compression_toolkit.core.MpMetricNormalization(value)

MAXBIT: normalize sensitivity metrics of layer candidates by max-bitwidth candidate (of that layer).

MINBIT: normalize sensitivity metrics of layer candidates by min-bitwidth candidate (of that layer).

NONE: no normalization.