Skip to content

Conversation

sayakpaul
Copy link
Member

What does this PR do?

Many models use "# Copied from ..." implementations of attn_processors and set_attn_processor. They are basically the same as what we have implemented in

class AttentionMixin:

This PR makes those models inherit from AttentionMixin and removes the copied-over implementations.

I decided to leave fuse_qkv_projections and unfuse_qkv_projections out of this PR because some models don't have attention processors implemented in a way that would make this seamless. But the methods removed in this PR should be very harmless.

@sayakpaul sayakpaul requested review from DN6 and dg845 October 10, 2025 15:39
@sayakpaul sayakpaul marked this pull request as draft October 10, 2025 16:18
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul sayakpaul marked this pull request as ready for review October 11, 2025 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants