MixedPrecisionQuantizationConfig¶
Class to configure the quantization process of the model when quantizing in mixed-precision:
- class model_compression_toolkit.core.MixedPrecisionQuantizationConfig(compute_distance_fn=None, distance_weighting_method=None, num_of_images=32, configuration_overwrite=None, num_interest_points_factor=1.0, use_hessian_based_scores=False, norm_scores=True, refine_mp_solution=True, metric_normalization_threshold=10000000000.0, hessian_batch_size=32, metric_normalization=MpMetricNormalization.NONE, metric_epsilon=1e-06, exp_distance_weighting_sigma=0.1, custom_metric_fn=None)¶
Class with mixed precision parameters to quantize the input model.
- Parameters:
compute_distance_fn (Callable) – Function to compute a distance between two tensors. If None, using pre-defined distance methods based on the layer type for each layer.
distance_weighting_method (MpDistanceWeighting) – distance weighting method to use. By default, MpDistanceWeighting.AVG.
num_of_images (int) – Number of images to use to evaluate the sensitivity of a mixed-precision model comparing to the float model.
configuration_overwrite (List[int]) – A list of integers that enables overwrite of mixed precision with a predefined one.
num_interest_points_factor (float) – A multiplication factor between zero and one (represents percentage) to reduce the number of interest points used to calculate the distance metric.
use_hessian_based_scores (bool) – Whether to use Hessian-based scores for weighted average distance metric computation. This is identical to passing distance_weighting_method=MpDistanceWeighting.HESSIAN.
norm_scores (bool) – Whether to normalize the returned scores for the weighted distance metric (to get values between 0 and 1).
refine_mp_solution (bool) – Whether to try to improve the final mixed-precision configuration using a greedy algorithm that searches layers to increase their bit-width, or not.
metric_normalization_threshold (float) – A threshold for checking the mixed precision distance metric values, In case of values larger than this threshold, the metric will be scaled to prevent numerical issues.
hessian_batch_size (int) – The Hessian computation batch size. used only if using mixed precision with Hessian-based objective.
metric_normalization (MpMetricNormalization) – Metric normalization method.
metric_epsilon (float | None) – ensure minimal distance between the metric for any non-max-bidwidth candidate and a max-bitwidth candidate, i.e. metric(non-max-bitwidth) >= metric(max-bitwidth) + epsilon. If none, the computed metrics are used as is.
exp_distance_weighting_sigma (float) – sigma for exponential weighting method. A distance for each interest point is normalized by sigma prior to applying exponent.
custom_metric_fn (Callable) – Function to compute a custom metric. As input gets the model_mp and returns a float value for metric. If None, uses interest point metric.
MpDistanceWeighting¶
- class model_compression_toolkit.core.MpDistanceWeighting(value)¶
Defines interest points distances weighting methods.
AVG - take the average distance over all interest points.
LAST_LAYER - take only the distance of the last interest point.
EXP - weighted average with weights based on exponent of negative distances between activations of the quantized and the float models.
HESSIAN - weighted average with Hessians as weights.
MpMetricNormalization¶
- class model_compression_toolkit.core.MpMetricNormalization(value)¶
MAXBIT: normalize sensitivity metrics of layer candidates by max-bitwidth candidate (of that layer).
MINBIT: normalize sensitivity metrics of layer candidates by min-bitwidth candidate (of that layer).
NONE: no normalization.