wrapper¶
- class model_compression_toolkit.wrapper.mct_wrapper.MCTWrapper¶
Wrapper class for Model Compression Toolkit (MCT) quantization and export.
This class provides a unified interface for various neural network quantization methods including Post-Training Quantization (PTQ), Gradient Post-Training Quantization (GPTQ). It supports both TensorFlow and PyTorch frameworks with optional mixed-precision quantization.
The wrapper manages the complete quantization pipeline from model input to quantized model export, handling framework-specific configurations and Target Platform Capabilities (TPC) setup.
- quantize_and_export(float_model, representative_dataset, framework='pytorch', method='PTQ', use_mixed_precision=False, param_items=None)¶
Main function to perform model quantization and export.
- Return type:
Tuple[bool,Any]- Parameters:
float_model – The float model to be quantized.
representative_dataset (Callable, np.array, tf.Tensor) – Representative dataset for calibration.
framework (str) – ‘tensorflow’ or ‘pytorch’. Default: ‘pytorch’
method (str) – Quantization method, e.g., ‘PTQ’ or ‘GPTQ’. Default: ‘PTQ’
use_mixed_precision (bool) – Whether to use mixed-precision quantization. Default: False
param_items (list) – List of parameter settings. [[key,value],…]. Default: None
- Returns:
tuple (quantization success flag, quantized model)
Examples
Import MCT
>>> import model_compression_toolkit as mct
Prepare the float model and dataset
>>> float_model = ... >>> representative_dataset = ...
Create an instance of the MCTWrapper
>>> wrapper = mct.MCTWrapper()
Set framework, method, and other parameters
>>> framework = 'tensorflow' >>> method = 'PTQ' >>> use_mixed_precision = False
Set parameters if needed
>>> param_items = [[key, value]...]
Quantize and export the model
>>> flag, quantized_model = wrapper.quantize_and_export( ... float_model=float_model, ... representative_dataset=representative_dataset, ... framework=framework, ... method=method, ... use_mixed_precision=use_mixed_precision, ... param_items=param_items ... )
Parameters
Initialize MCTWrapper with default parameters
Users can update the following parameters in param_items.
Note
The low priority variable can be left at its default value, so there is no need to specify it. Specify it as necessary, for example, if you receive a warning from the XQuant Extension Tool.
PTQ
Parameter Key
Default Value
Description
sdsp_version
‘3.14’
By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.
save_model_path
‘./qmodel.keras’ / ‘./qmodel.onnx’
Path to save quantized model (Keras/Pytorch)
activation_error_method
mct.core.QuantizationErrorMethod.MSE
Activation quantization error method (low priority)
weights_bias_correction
True
Enable weights bias correction (low priority)
z_threshold
float(‘inf’)
Z-threshold for quantization (low priority)
linear_collapsing
True
Enable linear layer collapsing (low priority)
residual_collapsing
True
Enable residual connection collapsing (low priority)
PTQ, mixed_precision
Parameter Key
Default Value
Description
sdsp_version
‘3.14’
By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.
save_model_path
‘./qmodel.keras’ / ‘./qmodel.onnx’
Path to save quantized model (Keras/Pytorch)
num_of_images
32
Number of images for mixed precision
weights_compression_ratio
0.75
Weights compression ratio for mixed precision for resource util (0.0~1.0)
activation_error_method
mct.core.QuantizationErrorMethod.MSE
Activation quantization error method (low priority)
weights_bias_correction
True
Enable weights bias correction (low priority)
z_threshold
float(‘inf’)
Z-threshold for quantization (low priority)
linear_collapsing
True
Enable linear layer collapsing (low priority)
residual_collapsing
True
Enable residual connection collapsing (low priority)
distance_weighting_method
default of MixedPrecisionQuantizationConfig
Distance weighting method for mixed precision (low priority)
use_hessian_based_scores
False
Use Hessian-based scores for mixed precision (low priority)
GPTQ
Parameter Key
Default Value
Description
sdsp_version
‘3.14’
By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.
save_model_path
‘./qmodel.keras’ / ‘./qmodel.onnx’
Path to save quantized model (Keras/Pytorch)
n_epochs
5
Number of training epochs for GPTQ
activation_error_method
mct.core.QuantizationErrorMethod.MSE
Activation quantization error method (low priority)
weights_bias_correction
True
Enable weights bias correction (low priority)
z_threshold
float(‘inf’)
Z-threshold for quantization (low priority)
linear_collapsing
True
Enable linear layer collapsing (low priority)
residual_collapsing
True
Enable residual connection collapsing (low priority)
optimizer
default of get_keras_gptq_config or get_pytorch_gptq_config
Optimizer for GPTQ (low priority)
GPTQ, mixed_precision
Parameter Key
Default Value
Description
sdsp_version
‘3.14’
By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.
save_model_path
‘./qmodel.keras’ / ‘./qmodel.onnx’
Path to save quantized model (Keras/Pytorch)
num_of_images
32
Number of images for mixed precision
weights_compression_ratio
0.75
Weights compression ratio for mixed precision for resource util (0.0~1.0)
n_epochs
5
Number of training epochs for GPTQ
activation_error_method
mct.core.QuantizationErrorMethod.MSE
Activation quantization error method (low priority)
weights_bias_correction
True
Enable weights bias correction (low priority)
z_threshold
float(‘inf’)
Z-threshold for quantization (low priority)
linear_collapsing
True
Enable linear layer collapsing (low priority)
residual_collapsing
True
Enable residual connection collapsing (low priority)
distance_weighting_method
default of MixedPrecisionQuantizationConfig
Distance weighting method for mixed precision (low priority)
use_hessian_based_scores
False
Use Hessian-based scores for mixed precision (low priority)
optimizer
default of get_keras_gptq_config or get_pytorch_gptq_config
Optimizer for GPTQ (low priority)