add rotary kernel support to Qwen3 model #41147

kaixuanliu · 2025-09-25T05:13:28Z

No description provided.

Signed-off-by: Liu, Kaixuan <[email protected]>

kaixuanliu · 2025-09-25T08:07:19Z

I made benchmark for Qwen/Qwen3-4B-Instruct-2507 model, and on Intel XPU, it will get ~10% performance improvement for E2E time. While on A100, there is no obvious performance improvement or drop. Pls let me know if it is OK using this manner to apply rotary kernel, and then I will add the support for more models.

Signed-off-by: Liu, Kaixuan <[email protected]>

…rmers into rotary-kernel

Signed-off-by: Liu, Kaixuan <[email protected]>

Rocketknight1 · 2025-09-25T14:22:52Z

cc @ArthurZucker

MekkCyber

Thanks for this integration @kaixuanliu ! I left few nits to consider

MekkCyber · 2025-09-25T14:24:18Z

src/transformers/models/qwen3/modeling_qwen3.py

+        global use_kernels
+        use_kernels = getattr(self, "use_kernels", False)
+


It's better to have an attention kwarg passed use_rotary_kernel for example than defining a global variable like this

You mean add a param called use_rotary_kernel to kwargs here, and passed it down to Qwen3Attention?

MekkCyber · 2025-09-25T14:26:14Z

src/transformers/models/qwen3/modeling_qwen3.py

 from ...cache_utils import Cache, DynamicCache
 from ...generation import GenerationMixin
 from ...integrations import use_kernel_forward_from_hub
+from ...integrations.hub_kernels import rotary_kernel


I think we need to lazily load the kernel, because here we are loading it before even knowing if the user wants to use kernels or not

Thx for your advice! Have updated related code

MekkCyber · 2025-09-25T14:26:37Z

src/transformers/models/qwen3/modeling_qwen3.py

+def apply_rotary_kernel(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
+    """
+    Rotary kernel implementation wrapper
+    Adapts rotary kernels implementation to match HuggingFace apply_rotary_pos_emb signature
+    """
+    cos = cos.unsqueeze(unsqueeze_dim)
+    sin = sin.unsqueeze(unsqueeze_dim)
+
+    q_rotated = q.clone()
+    k_rotated = k.clone()
+
+    # Get half dimension for rotation
+    half_dim = q.shape[-1] // 2
+    q1 = q_rotated[..., :half_dim]
+    q2 = q_rotated[..., half_dim:]
+    k1 = k_rotated[..., :half_dim]
+    k2 = k_rotated[..., half_dim:]
+    if cos.shape[-1] != half_dim:
+        # Trim cos/sin to match half_dim
+        cos = cos[..., :half_dim]
+        sin = sin[..., :half_dim]
+
+    # Apply rotary embedding using our kernel
+    rotary_kernel.apply_rotary(q1, q2, cos, sin, q1, q2, False)


Did you try to benchmark the performance with and without this kernel ?

Yes, on Intel XPU, one single rotary op needs 0.22 ms, and it drops to 0.1 ms after applying this patch. above 2x speedup.

Signed-off-by: Liu, Kaixuan <[email protected]>

…rmers into rotary-kernel

github-actions · 2025-09-26T08:01:58Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: dots1, qwen3, qwen3_moe, qwen3_omni_moe

kaixuanliu added 2 commits September 25, 2025 13:12

add rotary kernel support to Qwen3 model

69f2ca8

Signed-off-by: Liu, Kaixuan <[email protected]>

delete unnecessary import

d2bf5c5

Signed-off-by: Liu, Kaixuan <[email protected]>

kaixuanliu marked this pull request as ready for review September 25, 2025 07:45

github-actions bot requested review from ArthurZucker and Rocketknight1 September 25, 2025 07:45

adjust code

b0cbab5

Signed-off-by: Liu, Kaixuan <[email protected]>

kaixuanliu marked this pull request as draft September 25, 2025 08:43

kaixuanliu added 4 commits September 25, 2025 10:04

adjust code

8dede65

Signed-off-by: Liu, Kaixuan <[email protected]>

Merge branch 'rotary-kernel' of https://github.com/kaixuanliu/transfo…

5c02189

…rmers into rotary-kernel

put get rotary kernel to hub_kernels.py

137069b

Signed-off-by: Liu, Kaixuan <[email protected]>

fix wrong import

8ac3e1e

Signed-off-by: Liu, Kaixuan <[email protected]>

kaixuanliu marked this pull request as ready for review September 25, 2025 10:26

MekkCyber reviewed Sep 25, 2025

View reviewed changes

kaixuanliu added 4 commits September 26, 2025 02:51

refine code and adjust related modular code

29f83f2

Signed-off-by: Liu, Kaixuan <[email protected]>

Merge branch 'main' into rotary-kernel

7729b7f

fix modular mismatch bug

94e4f60

Signed-off-by: Liu, Kaixuan <[email protected]>

Merge branch 'rotary-kernel' of https://github.com/kaixuanliu/transfo…

b96a7c9

…rmers into rotary-kernel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add rotary kernel support to Qwen3 model #41147

add rotary kernel support to Qwen3 model #41147

kaixuanliu commented Sep 25, 2025

Uh oh!

kaixuanliu commented Sep 25, 2025 •

edited

Loading

Uh oh!

Rocketknight1 commented Sep 25, 2025

Uh oh!

MekkCyber left a comment

Uh oh!

MekkCyber Sep 25, 2025

Uh oh!

kaixuanliu Sep 26, 2025

Uh oh!

MekkCyber Sep 25, 2025

Uh oh!

kaixuanliu Sep 26, 2025 •

edited

Loading

Uh oh!

MekkCyber Sep 25, 2025

Uh oh!

kaixuanliu Sep 26, 2025

Uh oh!

github-actions bot commented Sep 26, 2025

Uh oh!

Uh oh!

		global use_kernels
		use_kernels = getattr(self, "use_kernels", False)

add rotary kernel support to Qwen3 model #41147

Are you sure you want to change the base?

add rotary kernel support to Qwen3 model #41147

Conversation

kaixuanliu commented Sep 25, 2025

Uh oh!

kaixuanliu commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Sep 25, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

MekkCyber Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

kaixuanliu Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

MekkCyber Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

kaixuanliu Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MekkCyber Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

kaixuanliu Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 26, 2025

Uh oh!

Uh oh!

kaixuanliu commented Sep 25, 2025 •

edited

Loading

kaixuanliu Sep 26, 2025 •

edited

Loading