Skip to content

Conversation

sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Sep 4, 2025

What does this PR do?

onload_device = torch.device("cuda")
offload_device = torch.device("cpu")
apply_group_offloading(
    text_encoder,
    onload_device=onload_device,
    offload_device=offload_device,
    offload_type="leaf_level",
    use_stream=True
)
transformer.enable_group_offload(
    onload_device=onload_device,
    offload_device=offload_device,
    offload_type="leaf_level",
    use_stream=True
)
apply_group_offloading(
    vae,
    onload_device=onload_device,
    offload_device=offload_device,
    offload_type="block_level",
    offload_type="leaf_level",
    use_stream=True
)

to

pipe.enable_group_offload(
    onload_device=onload_device,
    offload_device=offload_device,
    offload_type="leaf_level",
    use_stream=True
)

Of course, if users still want to apply different offloading techniques to different model-level components, they can easily choose to do so. But IMO, enable_group_offload() is an easier entrypoint.

We can allow users to pass mappings like we do for quant_mapping in PipelineQuantizationConfig in the future.

Will request for reviews after CI.

TODOs

record_stream: bool = False,
low_cpu_mem_usage=False,
offload_to_disk_path: Optional[str] = None,
exclude_modules: Optional[Union[str, List[str]]] = None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to expose this as an argument as opposed to how we do model_cpu_offload_seq, for example:

model_cpu_offload_seq = "text_encoder->text_encoder_2->image_encoder->transformer->vae"

This is because model CPU offloading relies on a sequence for device management. I don't think we have that constraint in the case of group offloading.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul sayakpaul requested review from DN6 and a-r-r-o-w September 4, 2025 07:14
@stevhliu
Copy link
Member

stevhliu commented Sep 4, 2025

Really neat how simplified this has become! 🎉

Copy link
Contributor

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! We could reduce LoC by creating a kwargs dict and passing into both branches, but definitely not a blocker

Tests look good but maybe the two could be combined into one to reduce total run count

@sayakpaul sayakpaul merged commit 4345907 into main Sep 10, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants