wrapper¶

class model_compression_toolkit.wrapper.mct_wrapper.MCTWrapper¶

Wrapper class for Model Compression Toolkit (MCT) quantization and export.

This class provides a unified interface for various neural network quantization methods including Post-Training Quantization (PTQ), Gradient Post-Training Quantization (GPTQ). It supports both TensorFlow and PyTorch frameworks with optional mixed-precision quantization.

The wrapper manages the complete quantization pipeline from model input to quantized model export, handling framework-specific configurations and Target Platform Capabilities (TPC) setup.

quantize_and_export(float_model, representative_dataset, framework='pytorch', method='PTQ', use_mixed_precision=False, param_items=None)¶

Main function to perform model quantization and export.

Return type:

Tuple[bool, Any]

Parameters:

float_model – The float model to be quantized.
representative_dataset (Callable, np.array, tf.Tensor) – Representative dataset for calibration.
framework (str) – ‘tensorflow’ or ‘pytorch’. Default: ‘pytorch’
method (str) – Quantization method, e.g., ‘PTQ’ or ‘GPTQ’. Default: ‘PTQ’
use_mixed_precision (bool) – Whether to use mixed-precision quantization. Default: False
param_items (list) – List of parameter settings. [[key,value],…]. Default: None

Returns:

tuple (quantization success flag, quantized model)

Examples

Import MCT

>>> import model_compression_toolkit as mct

Prepare the float model and dataset

>>> float_model = ...
>>> representative_dataset = ...

Create an instance of the MCTWrapper

>>> wrapper = mct.MCTWrapper()

Set framework, method, and other parameters

>>> framework = 'tensorflow'
>>> method = 'PTQ'
>>> use_mixed_precision = False

Set parameters if needed

>>> param_items = [[key, value]...]

Quantize and export the model

>>> flag, quantized_model = wrapper.quantize_and_export(
...     float_model=float_model,
...     representative_dataset=representative_dataset,
...     framework=framework,
...     method=method,
...     use_mixed_precision=use_mixed_precision,
...     param_items=param_items
... )

Parameters

Initialize MCTWrapper with default parameters

Users can update the following parameters in param_items.

Note

The low priority variable can be left at its default value, so there is no need to specify it. Specify it as necessary, for example, if you receive a warning from the XQuant Extension Tool.

PTQ

Parameter Key	Default Value	Description
sdsp_version	‘3.14’	By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.
save_model_path	‘./qmodel.keras’ / ‘./qmodel.onnx’	Path to save quantized model (Keras/Pytorch)
activation_error_method	mct.core.QuantizationErrorMethod.MSE	Activation quantization error method (low priority)
weights_bias_correction	True	Enable weights bias correction (low priority)
z_threshold	float(‘inf’)	Z-threshold for quantization (low priority)
linear_collapsing	True	Enable linear layer collapsing (low priority)
residual_collapsing	True	Enable residual connection collapsing (low priority)

PTQ, mixed_precision

Parameter Key	Default Value	Description
sdsp_version	‘3.14’	By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.
save_model_path	‘./qmodel.keras’ / ‘./qmodel.onnx’	Path to save quantized model (Keras/Pytorch)
num_of_images	32	Number of images for mixed precision
weights_compression_ratio	0.75	Weights compression ratio for mixed precision for resource util (0.0～1.0)
activation_error_method	mct.core.QuantizationErrorMethod.MSE	Activation quantization error method (low priority)
weights_bias_correction	True	Enable weights bias correction (low priority)
z_threshold	float(‘inf’)	Z-threshold for quantization (low priority)
linear_collapsing	True	Enable linear layer collapsing (low priority)
residual_collapsing	True	Enable residual connection collapsing (low priority)
distance_weighting_method	default of MixedPrecisionQuantizationConfig	Distance weighting method for mixed precision (low priority)
use_hessian_based_scores	False	Use Hessian-based scores for mixed precision (low priority)

GPTQ

Parameter Key	Default Value	Description
sdsp_version	‘3.14’	By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.
save_model_path	‘./qmodel.keras’ / ‘./qmodel.onnx’	Path to save quantized model (Keras/Pytorch)
n_epochs	5	Number of training epochs for GPTQ
activation_error_method	mct.core.QuantizationErrorMethod.MSE	Activation quantization error method (low priority)
weights_bias_correction	True	Enable weights bias correction (low priority)
z_threshold	float(‘inf’)	Z-threshold for quantization (low priority)
linear_collapsing	True	Enable linear layer collapsing (low priority)
residual_collapsing	True	Enable residual connection collapsing (low priority)
optimizer	default of get_keras_gptq_config or get_pytorch_gptq_config	Optimizer for GPTQ (low priority)

GPTQ, mixed_precision

Parameter Key	Default Value	Description
sdsp_version	‘3.14’	By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.
save_model_path	‘./qmodel.keras’ / ‘./qmodel.onnx’	Path to save quantized model (Keras/Pytorch)
num_of_images	32	Number of images for mixed precision
weights_compression_ratio	0.75	Weights compression ratio for mixed precision for resource util (0.0～1.0)
n_epochs	5	Number of training epochs for GPTQ
activation_error_method	mct.core.QuantizationErrorMethod.MSE	Activation quantization error method (low priority)
weights_bias_correction	True	Enable weights bias correction (low priority)
z_threshold	float(‘inf’)	Z-threshold for quantization (low priority)
linear_collapsing	True	Enable linear layer collapsing (low priority)
residual_collapsing	True	Enable residual connection collapsing (low priority)
distance_weighting_method	default of MixedPrecisionQuantizationConfig	Distance weighting method for mixed precision (low priority)
use_hessian_based_scores	False	Use Hessian-based scores for mixed precision (low priority)
optimizer	default of get_keras_gptq_config or get_pytorch_gptq_config	Optimizer for GPTQ (low priority)