wrapper

class model_compression_toolkit.wrapper.mct_wrapper.MCTWrapper

Wrapper class for Model Compression Toolkit (MCT) quantization and export.

This class provides a unified interface for various neural network quantization methods including Post-Training Quantization (PTQ), Gradient Post-Training Quantization (GPTQ). It supports both TensorFlow and PyTorch frameworks with optional mixed-precision quantization.

The wrapper manages the complete quantization pipeline from model input to quantized model export, handling framework-specific configurations and Target Platform Capabilities (TPC) setup.

quantize_and_export(float_model, representative_dataset, framework='pytorch', method='PTQ', use_mixed_precision=False, param_items=None)

Main function to perform model quantization and export.

Return type:

Tuple[bool, Any]

Parameters:
  • float_model – The float model to be quantized.

  • representative_dataset (Callable, np.array, tf.Tensor) – Representative dataset for calibration.

  • framework (str) – ‘tensorflow’ or ‘pytorch’. Default: ‘pytorch’

  • method (str) – Quantization method, e.g., ‘PTQ’ or ‘GPTQ’. Default: ‘PTQ’

  • use_mixed_precision (bool) – Whether to use mixed-precision quantization. Default: False

  • param_items (list) – List of parameter settings. [[key,value],…]. Default: None

Returns:

tuple (quantization success flag, quantized model)

Examples

Import MCT

>>> import model_compression_toolkit as mct

Prepare the float model and dataset

>>> float_model = ...
>>> representative_dataset = ...

Create an instance of the MCTWrapper

>>> wrapper = mct.MCTWrapper()

Set framework, method, and other parameters

>>> framework = 'tensorflow'
>>> method = 'PTQ'
>>> use_mixed_precision = False

Set parameters if needed

>>> param_items = [[key, value]...]

Quantize and export the model

>>> flag, quantized_model = wrapper.quantize_and_export(
...     float_model=float_model,
...     representative_dataset=representative_dataset,
...     framework=framework,
...     method=method,
...     use_mixed_precision=use_mixed_precision,
...     param_items=param_items
... )

Parameters

Initialize MCTWrapper with default parameters

Users can update the following parameters in param_items.

Note

The low priority variable can be left at its default value, so there is no need to specify it. Specify it as necessary, for example, if you receive a warning from the XQuant Extension Tool.

PTQ

Parameter Key

Default Value

Description

sdsp_version

‘3.14’

By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.

save_model_path

‘./qmodel.keras’ / ‘./qmodel.onnx’

Path to save quantized model (Keras/Pytorch)

activation_error_method

mct.core.QuantizationErrorMethod.MSE

Activation quantization error method (low priority)

weights_bias_correction

True

Enable weights bias correction (low priority)

z_threshold

float(‘inf’)

Z-threshold for quantization (low priority)

linear_collapsing

True

Enable linear layer collapsing (low priority)

residual_collapsing

True

Enable residual connection collapsing (low priority)

PTQ, mixed_precision

Parameter Key

Default Value

Description

sdsp_version

‘3.14’

By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.

save_model_path

‘./qmodel.keras’ / ‘./qmodel.onnx’

Path to save quantized model (Keras/Pytorch)

num_of_images

32

Number of images for mixed precision

weights_compression_ratio

0.75

Weights compression ratio for mixed precision for resource util (0.0~1.0)

activation_error_method

mct.core.QuantizationErrorMethod.MSE

Activation quantization error method (low priority)

weights_bias_correction

True

Enable weights bias correction (low priority)

z_threshold

float(‘inf’)

Z-threshold for quantization (low priority)

linear_collapsing

True

Enable linear layer collapsing (low priority)

residual_collapsing

True

Enable residual connection collapsing (low priority)

distance_weighting_method

default of MixedPrecisionQuantizationConfig

Distance weighting method for mixed precision (low priority)

use_hessian_based_scores

False

Use Hessian-based scores for mixed precision (low priority)

GPTQ

Parameter Key

Default Value

Description

sdsp_version

‘3.14’

By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.

save_model_path

‘./qmodel.keras’ / ‘./qmodel.onnx’

Path to save quantized model (Keras/Pytorch)

n_epochs

5

Number of training epochs for GPTQ

activation_error_method

mct.core.QuantizationErrorMethod.MSE

Activation quantization error method (low priority)

weights_bias_correction

True

Enable weights bias correction (low priority)

z_threshold

float(‘inf’)

Z-threshold for quantization (low priority)

linear_collapsing

True

Enable linear layer collapsing (low priority)

residual_collapsing

True

Enable residual connection collapsing (low priority)

optimizer

default of get_keras_gptq_config or get_pytorch_gptq_config

Optimizer for GPTQ (low priority)

GPTQ, mixed_precision

Parameter Key

Default Value

Description

sdsp_version

‘3.14’

By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.

save_model_path

‘./qmodel.keras’ / ‘./qmodel.onnx’

Path to save quantized model (Keras/Pytorch)

num_of_images

32

Number of images for mixed precision

weights_compression_ratio

0.75

Weights compression ratio for mixed precision for resource util (0.0~1.0)

n_epochs

5

Number of training epochs for GPTQ

activation_error_method

mct.core.QuantizationErrorMethod.MSE

Activation quantization error method (low priority)

weights_bias_correction

True

Enable weights bias correction (low priority)

z_threshold

float(‘inf’)

Z-threshold for quantization (low priority)

linear_collapsing

True

Enable linear layer collapsing (low priority)

residual_collapsing

True

Enable residual connection collapsing (low priority)

distance_weighting_method

default of MixedPrecisionQuantizationConfig

Distance weighting method for mixed precision (low priority)

use_hessian_based_scores

False

Use Hessian-based scores for mixed precision (low priority)

optimizer

default of get_keras_gptq_config or get_pytorch_gptq_config

Optimizer for GPTQ (low priority)