Skip to content

Conversation

@quanliu1991
Copy link

@quanliu1991 quanliu1991 commented Nov 17, 2025

Add num_splits parameter for mha_varlen_fwd FA2 to support batch-invariant processing

Batch-invariant is currently unsupported on FA2 for SM80 GPUs, which can lead to inconsistent outputs for the same input.
This PR adds a num_splits parameter. Setting num_splits=1 ensures consistent outputs when batch-invariant is enabled.

vllm-project/vllm#27433 (comment)
Verified on A800 GPU with Qwen-3 32B model.
Testing done with --disable-cascade-attn disabled, under TP=1 and TP=2 settings.
Outputs are consistent.

@anxiang1836
Copy link

Thks. I have noticed that vllm 0.11.2 released today, but it currently unsupported SM89 GPUs yet, right?
I need to wait next vllm release version after current PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants