[feat] Add causal_conv1d_update triton kernel #4307

OsirisDuan · 2025-11-20T07:56:54Z

What this PR does / why we need it?

This PR introduces a new Triton kernel _causal_conv1d_update_kernel_no_cache_len_no_mtp to support efficient causal 1D convolution updates in Qwen3-next. The kernel is integrated into causal_conv1d_update_npu, enabling better performance on Ascend NPU hardware.

Does this PR introduce any user-facing change?

It belongs to optimization at the internal implementation level, which helps improve the final model inference performance but does not change the user experience at the API level or in terms of invocation methods.

How was this patch tested?

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

github-actions · 2025-11-20T07:57:16Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a new Triton kernel _causal_conv1d_update_kernel_no_cache_len_no_mtp for a specific case of causal 1D convolution, and integrates it into the existing causal_conv1d_update_npu function. While the addition is a good optimization, I've found a critical bug in the implementation of the new kernel where it fails to correctly write outputs for sequences longer than one token, causing data loss. My review includes a specific comment with a code suggestion to fix this issue.

gemini-code-assist · 2025-11-20T07:58:33Z

vllm_ascend/ops/triton/mamba/casual_conv1d.py

+                tl.store(
+                    out_ptr
+                    + pid * out_batch_stride
+                    + (doffs + tl.arange(0, DIM_BLOCK)) * out_len,
+                    result,
+                )


There is a critical bug in this loop. The tl.store operation writes to a memory location that does not depend on the loop variable i. As a result, for sequences with seq_len > 1, the output for each token will be written to the same location, overwriting the previous one. Only the result for the last token (i = seq_len - 1) will be preserved, leading to incorrect convolution output. To fix this, you need to include the token index i in the output pointer calculation to ensure each token's result is stored in its correct position.

Suggested change

tl.store(

out_ptr

+ pid * out_batch_stride

+ (doffs + tl.arange(0, DIM_BLOCK)) * out_len,

result,

)

tl.store(

out_ptr

+ pid * out_batch_stride

+ (doffs + tl.arange(0, DIM_BLOCK)) * out_len + i,

result,

)

github-actions · 2025-11-26T03:09:03Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: Ascendyh <[email protected]>

github-actions bot added the module:ops label Nov 20, 2025

gemini-code-assist bot reviewed Nov 20, 2025

View reviewed changes

OsirisDuan changed the title ~~Add _causal_conv1d_update_kernel_no_cache_len_no_mtp~~ [task] Add causal_conv1d_update triton kernel Nov 20, 2025

github-actions bot added the merge-conflicts label Nov 26, 2025

Add _causal_conv1d_update_kernel_no_cache_len_no_mtp

6d1ffa3

Signed-off-by: Ascendyh <[email protected]>

OsirisDuan force-pushed the 251120_QWen3-Next_conv1d_update branch from e6c44b3 to 6d1ffa3 Compare December 1, 2025 06:03

github-actions bot removed the merge-conflicts label Dec 1, 2025

OsirisDuan changed the title ~~[task] Add causal_conv1d_update triton kernel~~ [feat] Add causal_conv1d_update triton kernel Dec 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] Add causal_conv1d_update triton kernel #4307

[feat] Add causal_conv1d_update triton kernel #4307

Uh oh!

OsirisDuan commented Nov 20, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 20, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[feat] Add causal_conv1d_update triton kernel #4307

Are you sure you want to change the base?

[feat] Add causal_conv1d_update triton kernel #4307

Uh oh!

Conversation

OsirisDuan commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

OsirisDuan commented Nov 20, 2025 •

edited

Loading