Skip to content

Conversation

e-ddykim
Copy link
Contributor

@e-ddykim e-ddykim commented Sep 4, 2025

Details:

  • This PR extends sdpa_micro to support paged attention for better performance.
    • The mixed stage of paged attention will be handled by sdpa_micro instead of pa_sdpa_opt.
  • Additionally, this PR allows sdpa_micro to support sliding window.

Tickets:

  • 169407, 170673, 172903, 173059

@e-ddykim e-ddykim requested a review from a team as a code owner September 4, 2025 05:39
@e-ddykim e-ddykim requested a review from a team as a code owner September 4, 2025 05:39
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Sep 4, 2025
@e-ddykim e-ddykim force-pushed the sdpa_micro_pa branch 2 times, most recently from d5b9a06 to a62e2ad Compare September 15, 2025 02:11
@e-ddykim e-ddykim force-pushed the sdpa_micro_pa branch 3 times, most recently from 5da134a to d95932b Compare October 1, 2025 12:07
@e-ddykim e-ddykim requested a review from a team as a code owner October 10, 2025 12:09
@e-ddykim e-ddykim requested review from CuriousPanCake and removed request for a team October 10, 2025 12:09
@github-actions github-actions bot added the category: transformations OpenVINO Runtime library - Transformations label Oct 10, 2025
}
}

if (config.is_paged_attention && data_type_traits::is_i8_u8(K.data_type)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we use config.is_kv_compressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config.is_kv_compressed is being used for the non-PA case. I'm not sure when it is used. But from the code, I see that it requires separate scale and zp inputs when config.is_kv_compressed is set. So, I didn't config.is_kv_compressed for the PA case.

Copy link
Contributor

@yeonbok yeonbok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except a minor comment

@yeonbok yeonbok added this pull request to the merge queue Oct 15, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 15, 2025
@e-ddykim e-ddykim added this pull request to the merge queue Oct 15, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 15, 2025
@p-durandin p-durandin added this pull request to the merge queue Oct 15, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 15, 2025
@e-ddykim e-ddykim force-pushed the sdpa_micro_pa branch 2 times, most recently from 94065db to aea9c4c Compare October 16, 2025 04:55
@github-actions github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Oct 16, 2025
@e-ddykim e-ddykim removed the request for review from CuriousPanCake October 16, 2025 04:56
@e-ddykim e-ddykim added this pull request to the merge queue Oct 16, 2025
Merged via the queue into openvinotoolkit:master with commit f35beb7 Oct 16, 2025
187 checks passed
@e-ddykim e-ddykim deleted the sdpa_micro_pa branch October 16, 2025 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants