[GPU] sdpa_micro for prefix caching #31968

e-ddykim · 2025-09-04T05:39:31Z

Details:

This PR extends sdpa_micro to support paged attention for better performance.
- The mixed stage of paged attention will be handled by sdpa_micro instead of pa_sdpa_opt.
Additionally, this PR allows sdpa_micro to support sliding window.

Tickets:

169407, 170673, 172903, 173059

yeonbok · 2025-10-13T19:54:58Z

src/plugins/intel_gpu/src/graph/impls/ocl_v2/sdpa/sdpa_gen_micro.cpp

        }
    }

+    if (config.is_paged_attention && data_type_traits::is_i8_u8(K.data_type)) {


can't we use config.is_kv_compressed?

The config.is_kv_compressed is being used for the non-PA case. I'm not sure when it is used. But from the code, I see that it requires separate scale and zp inputs when config.is_kv_compressed is set. So, I didn't config.is_kv_compressed for the PA case.

src/plugins/intel_gpu/src/graph/impls/ocl_v2/sdpa_micro.cl

yeonbok

Except a minor comment

e-ddykim requested a review from a team as a code owner September 4, 2025 05:39

e-ddykim added the do not merge label Sep 4, 2025

e-ddykim requested a review from a team as a code owner September 4, 2025 05:39

e-ddykim added do_not_review under_perf_check do_not_merge labels Sep 4, 2025

github-actions bot added the category: GPU OpenVINO GPU plugin label Sep 4, 2025

e-ddykim force-pushed the sdpa_micro_pa branch 2 times, most recently from d5b9a06 to a62e2ad Compare September 15, 2025 02:11

e-ddykim force-pushed the sdpa_micro_pa branch 3 times, most recently from 5da134a to d95932b Compare October 1, 2025 12:07

e-ddykim force-pushed the sdpa_micro_pa branch from c330023 to 98cb9b2 Compare October 10, 2025 12:09

e-ddykim requested a review from a team as a code owner October 10, 2025 12:09

e-ddykim requested review from CuriousPanCake and removed request for a team October 10, 2025 12:09

github-actions bot added the category: transformations OpenVINO Runtime library - Transformations label Oct 10, 2025

e-ddykim removed do not merge do_not_review under_perf_check do_not_merge labels Oct 13, 2025

yeonbok reviewed Oct 13, 2025

View reviewed changes

yeonbok reviewed Oct 14, 2025

View reviewed changes

src/plugins/intel_gpu/src/graph/impls/ocl_v2/sdpa_micro.cl Outdated Show resolved Hide resolved

yeonbok approved these changes Oct 14, 2025

View reviewed changes

e-ddykim force-pushed the sdpa_micro_pa branch from 98cb9b2 to 63db02c Compare October 14, 2025 06:06

yeonbok added this pull request to the merge queue Oct 15, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 15, 2025

e-ddykim added this pull request to the merge queue Oct 15, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 15, 2025

p-durandin added this pull request to the merge queue Oct 15, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 15, 2025

e-ddykim force-pushed the sdpa_micro_pa branch 2 times, most recently from 94065db to aea9c4c Compare October 16, 2025 04:55

github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Oct 16, 2025

e-ddykim removed the request for review from CuriousPanCake October 16, 2025 04:56

e-ddykim added 20 commits October 16, 2025 17:00

sdpa_micro for pa with support for kv-cache compression

36fb429

separate sdpa_micro kernels for prefill and mixed

d31bc09

code clean up

17a5f19

removed unnecessary changes

5659d21

added sdpa_micro configs for xehpc

a80911a

added sdpa_micro configs for xe2 arch

b8879d4

fixed code-style errors

44ac53e

updated paged_attention_gpu_test

a8a96a0

added GQA test cases

779ca07

updated pa_test case

aa7c579

allow sdpa_micro to support sliding window

ff3203a

updated to use default config for prefill phase

0934b73

fixed code style errors

d9bc677

fixed a bug

2f4ff97

updated K and V alignments for paged attention

ea352ab

fixed code style error

c022f69

restored v-cache alignment as before

2d6aeba

updated xehpc_h256_pa

43a9b3c

updated some test cases to use by-token instead of by-channel

169faa3

revert changes

87deb35

e-ddykim force-pushed the sdpa_micro_pa branch from aea9c4c to 87deb35 Compare October 16, 2025 08:00

e-ddykim added this pull request to the merge queue Oct 16, 2025

Merged via the queue into openvinotoolkit:master with commit f35beb7 Oct 16, 2025
187 checks passed

e-ddykim deleted the sdpa_micro_pa branch October 16, 2025 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPU] sdpa_micro for prefix caching #31968

[GPU] sdpa_micro for prefix caching #31968

Uh oh!

e-ddykim commented Sep 4, 2025 •

edited

Loading

Uh oh!

yeonbok Oct 13, 2025

Uh oh!

e-ddykim Oct 14, 2025

Uh oh!

Uh oh!

yeonbok left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[GPU] sdpa_micro for prefix caching #31968

[GPU] sdpa_micro for prefix caching #31968

Uh oh!

Conversation

e-ddykim commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

Uh oh!

yeonbok Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

e-ddykim Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yeonbok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

e-ddykim commented Sep 4, 2025 •

edited

Loading