Remove conv1d weight cast in Qwen3-Next forward #2084

Wei-Lin-Intel · 2025-10-24T15:47:54Z

Conv1D's compute precision in Qwen3-Next in G2 should be kept as float, thus Conv1D weight should cast to fp32.

This PR removes the unnecessary bf16->fp32 cast in every forward call in flat linear attention. Instead it just calls the cast once in the first time (normally during the profile run), and then removes the original bf16 conv1d weight since it won't be used any more.

Co-authored-by: Jing <[email protected]>

czhu15

LGTM except one minor question

czhu15 · 2025-10-29T06:58:34Z

vllm/model_executor/models/qwen3_next.py

-        attn[..., i, :i] = row + (row.unsqueeze(-1) * sub).sum(-2)
-    attn = attn + torch.eye(chunk_size, dtype=attn.dtype, device=attn.device)
+        row = attn[..., i, :i].contiguous()
+        sub = attn[..., :i, :]


why row need contiguous but sub doesn't?

row requires to slice from the last dim and would cause discontinue on the address, so it is better to perform contiguous.

Remove conv1d weight cast in Qwen3-Next forward

5be9211

Wei-Lin-Intel requested review from PatrykWo, afierka-intel, jikunshang, kzawora-intel, madamczyk-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners October 24, 2025 15:47

Wei-Lin-Intel added 2 commits October 24, 2025 23:57

reformat v1

8f24442

Optimize Gated Delta Relu for Qwen3-Next Prefill

77c068e

Co-authored-by: Jing <[email protected]>

Wei-Lin-Intel force-pushed the aice/v1.22.0 branch from ea997fe to 77c068e Compare October 25, 2025 16:03

Wei-Lin-Intel added 2 commits October 26, 2025 00:04

reformat v2

cebf326

reformat v3

3e64a90

czhu15 approved these changes Oct 29, 2025

View reviewed changes

czhu15 merged commit 4f5009a into HabanaAI:aice/v1.22.0 Oct 29, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove conv1d weight cast in Qwen3-Next forward #2084

Remove conv1d weight cast in Qwen3-Next forward #2084

Uh oh!

Wei-Lin-Intel commented Oct 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

czhu15 left a comment

Uh oh!

czhu15 Oct 29, 2025

Uh oh!

Wei-Lin-Intel Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Remove conv1d weight cast in Qwen3-Next forward #2084

Remove conv1d weight cast in Qwen3-Next forward #2084

Uh oh!

Conversation

Wei-Lin-Intel commented Oct 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

czhu15 left a comment

Choose a reason for hiding this comment

Uh oh!

czhu15 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Wei-Lin-Intel Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Wei-Lin-Intel commented Oct 24, 2025 •

edited by github-actions bot

Loading