Skip to content

[OpenVINO][Examples] Add Quantization for the OpenVINO Stable Diffusion Example#17807

Draft
anzr299 wants to merge 7 commits intopytorch:mainfrom
anzr299:an/openvino/quantize_lcm_model
Draft

[OpenVINO][Examples] Add Quantization for the OpenVINO Stable Diffusion Example#17807
anzr299 wants to merge 7 commits intopytorch:mainfrom
anzr299:an/openvino/quantize_lcm_model

Conversation

@anzr299
Copy link
Contributor

@anzr299 anzr299 commented Mar 3, 2026

Summary

Extend the stable diffusion example for OpenVINO backend with quantization support.

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 3, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17807

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 8 Awaiting Approval

As of commit ab09c86 with merge base 389ea94 (image):

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 3, 2026
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

@daniil-lyakhov daniil-lyakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general:

I think the maybe logic is not worth it there, and it would be nicer to have a separate quantize_unet and compress_model functions in each export function.

I mean now the diamond structure of export looks too complicated that it is in reality

from executorch.exir.backend.backend_details import CompileSpec
from torch.export import export
from torchao.quantization.pt2e.quantizer.quantizer import Quantizer
from tqdm import tqdm # type: ignore[import-untyped]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the type ignore here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was suggested by the lintrunner

Comment on lines +242 to +248
# Configure OpenVINO compilation
compile_spec = [CompileSpec("device", device.encode())]
partitioner = OpenvinoPartitioner(compile_spec)

# Lower to edge dialect and apply OpenVINO backend
edge_manager = to_edge_transform_and_lower(
exported_program, partitioner=[partitioner]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dublicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. Great catch!

Comment on lines +175 to +176
if not is_quantization_enabled:
return model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep only code which could raise an error inside of the try catch block

Comment on lines +180 to +194
# Quantize activations for the Unet Model. Other models are weights-only quantized.
pipeline = self.model_loader.pipeline
try:
# We need the models in FP32 to run inference for calibration data collection
self._set_pipeline_dtype(pipeline, torch.float32)
calibration_dataset = self.get_unet_calibration_dataset(pipeline)
finally:
self._set_pipeline_dtype(pipeline, self.model_loader.dtype)

quantized_model = quantize_model(
model,
mode=QuantizationMode.INT8_TRANSFORMER,
calibration_dataset=calibration_dataset,
smooth_quant=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if body is so big it worth to split the function on two like quantize and compress


def forward(self, *args, **kwargs):
"""
obtain and pass each input individually to ensure the order is maintained
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
obtain and pass each input individually to ensure the order is maintained
Obtain and pass each input individually to ensure the order is maintained

Comment on lines +141 to +145
dataset = datasets.load_dataset(
"google-research-datasets/conceptual_captions",
split="train",
trust_remote_code=True,
).shuffle(seed=42)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put the dataset name as an example param?

wrapped_unet = UNetWrapper(pipeline.unet, pipeline.unet.config)
pipeline.unet = wrapped_unet
# Run inference for data collection
pbar = tqdm(total=calibration_dataset_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe executorch has some sort of progress bar already? The less dependencies the better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tqdm is used in multiple places inside executorch examples too.

Comment on lines +179 to +187
if self.should_quantize_model(sd_model_component):
# Quantize activations for the Unet Model. Other models are weights-only quantized.
pipeline = self.model_loader.pipeline
try:
# We need the models in FP32 to run inference for calibration data collection
self._set_pipeline_dtype(pipeline, torch.float32)
calibration_dataset = self.get_unet_calibration_dataset(pipeline)
finally:
self._set_pipeline_dtype(pipeline, self.model_loader.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If some condition then calibration dataset is set for stable diffusion, don't see value in the should_quantize_model method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants