[OpenVINO][Examples] Add Quantization for the OpenVINO Stable Diffusion Example#17807
[OpenVINO][Examples] Add Quantization for the OpenVINO Stable Diffusion Example#17807anzr299 wants to merge 7 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17807
Note: Links to docs will display an error until the docs builds have been completed.
|
This PR needs a
|
…fusion component model names reliable
daniil-lyakhov
left a comment
There was a problem hiding this comment.
In general:
I think the maybe logic is not worth it there, and it would be nicer to have a separate quantize_unet and compress_model functions in each export function.
I mean now the diamond structure of export looks too complicated that it is in reality
| from executorch.exir.backend.backend_details import CompileSpec | ||
| from torch.export import export | ||
| from torchao.quantization.pt2e.quantizer.quantizer import Quantizer | ||
| from tqdm import tqdm # type: ignore[import-untyped] |
There was a problem hiding this comment.
Why is the type ignore here?
There was a problem hiding this comment.
It was suggested by the lintrunner
| # Configure OpenVINO compilation | ||
| compile_spec = [CompileSpec("device", device.encode())] | ||
| partitioner = OpenvinoPartitioner(compile_spec) | ||
|
|
||
| # Lower to edge dialect and apply OpenVINO backend | ||
| edge_manager = to_edge_transform_and_lower( | ||
| exported_program, partitioner=[partitioner] |
There was a problem hiding this comment.
Ah yes. Great catch!
| if not is_quantization_enabled: | ||
| return model |
There was a problem hiding this comment.
Please keep only code which could raise an error inside of the try catch block
| # Quantize activations for the Unet Model. Other models are weights-only quantized. | ||
| pipeline = self.model_loader.pipeline | ||
| try: | ||
| # We need the models in FP32 to run inference for calibration data collection | ||
| self._set_pipeline_dtype(pipeline, torch.float32) | ||
| calibration_dataset = self.get_unet_calibration_dataset(pipeline) | ||
| finally: | ||
| self._set_pipeline_dtype(pipeline, self.model_loader.dtype) | ||
|
|
||
| quantized_model = quantize_model( | ||
| model, | ||
| mode=QuantizationMode.INT8_TRANSFORMER, | ||
| calibration_dataset=calibration_dataset, | ||
| smooth_quant=True, | ||
| ) |
There was a problem hiding this comment.
This if body is so big it worth to split the function on two like quantize and compress
|
|
||
| def forward(self, *args, **kwargs): | ||
| """ | ||
| obtain and pass each input individually to ensure the order is maintained |
There was a problem hiding this comment.
| obtain and pass each input individually to ensure the order is maintained | |
| Obtain and pass each input individually to ensure the order is maintained |
| dataset = datasets.load_dataset( | ||
| "google-research-datasets/conceptual_captions", | ||
| split="train", | ||
| trust_remote_code=True, | ||
| ).shuffle(seed=42) |
There was a problem hiding this comment.
Maybe put the dataset name as an example param?
| wrapped_unet = UNetWrapper(pipeline.unet, pipeline.unet.config) | ||
| pipeline.unet = wrapped_unet | ||
| # Run inference for data collection | ||
| pbar = tqdm(total=calibration_dataset_size) |
There was a problem hiding this comment.
Maybe executorch has some sort of progress bar already? The less dependencies the better
There was a problem hiding this comment.
tqdm is used in multiple places inside executorch examples too.
| if self.should_quantize_model(sd_model_component): | ||
| # Quantize activations for the Unet Model. Other models are weights-only quantized. | ||
| pipeline = self.model_loader.pipeline | ||
| try: | ||
| # We need the models in FP32 to run inference for calibration data collection | ||
| self._set_pipeline_dtype(pipeline, torch.float32) | ||
| calibration_dataset = self.get_unet_calibration_dataset(pipeline) | ||
| finally: | ||
| self._set_pipeline_dtype(pipeline, self.model_loader.dtype) |
There was a problem hiding this comment.
If some condition then calibration dataset is set for stable diffusion, don't see value in the should_quantize_model method
Summary
Extend the stable diffusion example for OpenVINO backend with quantization support.