Skip to content

Conversation

dsocek
Copy link
Contributor

@dsocek dsocek commented Sep 10, 2025

What does this PR do?

The latest Gaudi stack now defaults to eager mode, where HPU/GPU Migration uses the default SDP kernel in FP32 precision. Previously, the default lazy mode used SDP in BF16 precision.

To improve performance on HPU devices, this PR changes the default to SDP with BF16 precision when running on HPU, provided that the dtype is BF16 or unset.

@dsocek dsocek force-pushed the hpu_sdp_on_bf16_support branch from de8e8ef to 8e73b58 Compare September 10, 2025 17:49
@astachowiczhabana
Copy link

LGTM
@regisss can you please approve/merge?

Copy link
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@regisss
Copy link
Contributor

regisss commented Sep 12, 2025

@dsocek It seems there is a formatting error: https://github.com/huggingface/diffusers/actions/runs/17622299913/job/50216966858?pr=12310

@dsocek
Copy link
Contributor Author

dsocek commented Sep 12, 2025

@dsocek It seems there is a formatting error: https://github.com/huggingface/diffusers/actions/runs/17622299913/job/50216966858?pr=12310

@regisss fixed the minor formatting issue

@dsocek dsocek force-pushed the hpu_sdp_on_bf16_support branch from 77b888f to b05259a Compare September 12, 2025 12:42
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@yiyixuxu yiyixuxu merged commit f5c113e into huggingface:main Sep 12, 2025
28 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants