[OpenVINO][Examples] Add Quantization for the OpenVINO Stable Diffusion Example#17807

Draft

anzr299 wants to merge 7 commits intopytorch:mainfrom

anzr299:an/openvino/quantize_lcm_model

Contributor

anzr299 commented Mar 3, 2026

Summary

Extend the stable diffusion example for OpenVINO backend with quantization support.

anzr299 added 2 commits

February 23, 2026 17:53


          init

056ed58


          update readme

810214d

pytorch-bot bot commented Mar 3, 2026 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17807

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 8 Awaiting Approval

As of commit ab09c86 with merge base 389ea94 ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla bot added the CLA Signed label

github-actions bot commented Mar 3, 2026

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

anzr299 added 5 commits

March 3, 2026 12:02


          lint

a7a41e3


          Merge branch 'main' into an/openvino/quantize_lcm_model

135fd60


          fix bugs; introduce support for fp16; add enum to maintain stable dif…

a164209

…fusion component model names reliable


          minor comment

1b605aa


          lint

ab09c86

daniil-lyakhov suggested changes

View reviewed changes

Contributor

daniil-lyakhov left a comment

In general:

I think the maybe logic is not worth it there, and it would be nicer to have a separate quantize_unet and compress_model functions in each export function.

I mean now the diamond structure of export looks too complicated that it is in reality

examples/openvino/stable_diffusion/export_lcm.py

               from executorch.exir.backend.backend_details import CompileSpec
               from torch.export import export
+              from torchao.quantization.pt2e.quantizer.quantizer import Quantizer
+              from tqdm import tqdm  # type: ignore[import-untyped]

Contributor

daniil-lyakhov Mar 3, 2026

Why is the type ignore here?

Contributor Author

anzr299 Mar 3, 2026

It was suggested by the lintrunner

examples/openvino/stable_diffusion/export_lcm.py

Comment on lines +242 to +248

+                          # Configure OpenVINO compilation
+                          compile_spec = [CompileSpec("device", device.encode())]
+                          partitioner = OpenvinoPartitioner(compile_spec)
+                          # Lower to edge dialect and apply OpenVINO backend
+                          edge_manager = to_edge_transform_and_lower(
+                              exported_program, partitioner=[partitioner]

Contributor

daniil-lyakhov Mar 3, 2026

Dublicate?

Contributor Author

anzr299 Mar 3, 2026

Ah yes. Great catch!

examples/openvino/stable_diffusion/export_lcm.py

Comment on lines +175 to +176

		if not is_quantization_enabled:
		return model

Contributor

daniil-lyakhov Mar 3, 2026

Please keep only code which could raise an error inside of the try catch block

examples/openvino/stable_diffusion/export_lcm.py

Comment on lines +180 to +194

+                              # Quantize activations for the Unet Model. Other models are weights-only quantized.
+                              pipeline = self.model_loader.pipeline
+                              try:
+                                  # We need the models in FP32 to run inference for calibration data collection
+                                  self._set_pipeline_dtype(pipeline, torch.float32)
+                                  calibration_dataset = self.get_unet_calibration_dataset(pipeline)
+                              finally:
+                                  self._set_pipeline_dtype(pipeline, self.model_loader.dtype)
+                              quantized_model = quantize_model(
+                                  model,
+                                  mode=QuantizationMode.INT8_TRANSFORMER,
+                                  calibration_dataset=calibration_dataset,
+                                  smooth_quant=True,
+                              )

Contributor

daniil-lyakhov Mar 3, 2026

This if body is so big it worth to split the function on two like quantize and compress

examples/openvino/stable_diffusion/export_lcm.py

+                          def forward(self, *args, **kwargs):
+                              """
+                              obtain and pass each input individually to ensure the order is maintained

Contributor

daniil-lyakhov Mar 3, 2026

Suggested change

      
                            obtain and pass each input individually to ensure the order is maintained
          
                            Obtain and pass each input individually to ensure the order is maintained

examples/openvino/stable_diffusion/export_lcm.py

Comment on lines +141 to +145

+                      dataset = datasets.load_dataset(
+                          "google-research-datasets/conceptual_captions",
+                          split="train",
+                          trust_remote_code=True,
+                      ).shuffle(seed=42)

Contributor

daniil-lyakhov Mar 3, 2026

Maybe put the dataset name as an example param?

examples/openvino/stable_diffusion/export_lcm.py

+                      wrapped_unet = UNetWrapper(pipeline.unet, pipeline.unet.config)
+                      pipeline.unet = wrapped_unet
+                      # Run inference for data collection
+                      pbar = tqdm(total=calibration_dataset_size)

Contributor

daniil-lyakhov Mar 3, 2026

Maybe executorch has some sort of progress bar already? The less dependencies the better

Contributor Author

anzr299 Mar 4, 2026

tqdm is used in multiple places inside executorch examples too.

examples/openvino/stable_diffusion/export_lcm.py

Comment on lines +179 to +187

+                          if self.should_quantize_model(sd_model_component):
+                              # Quantize activations for the Unet Model. Other models are weights-only quantized.
+                              pipeline = self.model_loader.pipeline
+                              try:
+                                  # We need the models in FP32 to run inference for calibration data collection
+                                  self._set_pipeline_dtype(pipeline, torch.float32)
+                                  calibration_dataset = self.get_unet_calibration_dataset(pipeline)
+                              finally:
+                                  self._set_pipeline_dtype(pipeline, self.model_loader.dtype)

Contributor

daniil-lyakhov Mar 3, 2026

If some condition then calibration dataset is set for stable diffusion, don't see value in the should_quantize_model method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

lucylq Awaiting requested review from lucylq lucylq will be requested when the pull request is marked ready for review lucylq is a code owner

Copilot code review Copilot Awaiting requested review from Copilot Copilot will automatically review once the pull request is marked ready for review

1 more reviewer

daniil-lyakhov daniil-lyakhov requested changes

At least 1 approving review is required to merge this pull request.

Labels