Enabling Hessian-based Mixed Precision

In Mixed Precision quantization, MCT will assign a different bit width to each weight in the model, depending on the weight’s layer sensitivity and a resource constraint defined by the user, such as target model size.

Check out the Mixed Precision tutorial for more information.

Overview

MCT offers a Hessian-based scoring mechanism to assess the importance of layers during the Mixed Precision search.

This feature can notably enhance Mixed Precision outcomes for certain network architectures.

Solution

Set the use_hessian_based_scores flag to True in the MixedPrecisionQuantizationConfig of the CoreConfig.

mixed_precision_config = mct.core.MixedPrecisionQuantizationConfig(use_hessian_based_scores=True)
core_config = mct.core.CoreConfig(mixed_precision_config=mixed_precision_config)
quantized_model, _ = mct.ptq.pytorch_post_training_quantization(...,
                                                                core_config=core_config)

Note

Computing Hessian scores can be computationally intensive, potentially leading to extended Mixed Precision search runtime. Furthermore, these scoring methods may introduce unexpected noise into the Mixed Precision process, necessitating a deeper understanding of the underlying mechanisms and potential recalibration of program parameters.