[BugFix] Fix DBO assert `assert B_block_table == B_q` #29933

LucasWilkinson · 2025-12-03T04:25:51Z

Since #28579 we could end up building with padded ubatch slices for PIECEWISE/NONE cudagraphs, since CUTLASS MLA assumes num_decodes_reqs == num_decode_tokens this could cause issues since only the tokens were padded; this fixes it by making sure we only use padded ubatch slices for metadata construction when using FULL cudagraphs

Failing CI: https://buildkite.com/organizations/vllm/pipelines/ci/builds/41668/jobs/019ae1ba-2178-4a96-adba-97fb4a46fe4f/log

Test Plan:

python -m pytest tests/v1/distributed/test_dbo.py was failing on B200 machines (was passing on H100s due to the backends not asserting on this)

Passes now

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-03T04:28:48Z

vllm/v1/worker/dp_utils.py

    )

-    # Don't microbatch unless every other DP worker is also microbatching
-    if not should_ubatch:
-        return (None, num_tokens_after_padding)
-
-    # This doesn't actually pad the ubatch slices. It just initializes the
-    # split point to the padded value so that padding can be applied
-    # to the second ubatch in pad_out_ubatch_slice after attention
-    # metadata creation
-    assert num_tokens_after_padding is not None
-    num_tokens_padded = int(num_tokens_after_padding[0].item())
-    token_split_point = int(num_tokens_padded) // 2
-
-    assert num_scheduled_tokens_per_request is not None
-    ubatch_slices = create_ubatch_slices(
-        num_scheduled_tokens_per_request, token_split_point
-    )
-    ubatch_slices = _pad_out_ubatch_slice(ubatch_slices, num_tokens_padded)
-    assert sum(s.num_tokens for s in ubatch_slices) == num_tokens_padded
-
-    return (ubatch_slices, num_tokens_after_padding)
+    return (should_ubatch, num_tokens_after_padding)


Update DP coordination call sites for bool return

The new coordinate_batch_across_dp now returns a (bool, Tensor) pair (return (should_ubatch, num_tokens_after_padding)), but vllm/v1/spec_decode/eagle.py::_pad_batch_across_dp still expects the first value to be None when microbatching is disabled and asserts ubatch_slices is None. With data-parallel EAGLE runs this now receives False instead, triggering the assertion before any work is done. The call sites at eagle.py:1261/1270 need to be adjusted to the new return type to avoid failing every DP EAGLE execution.

Useful? React with 👍 / 👎.

vllm/v1/worker/ubatch_utils.py

gemini-code-assist

Code Review

This pull request addresses a bug causing an assertion failure by ensuring that padded microbatch slices are used for metadata construction only when the cudagraph_mode is FULL. The changes are well-implemented and correctly resolve the issue.

The key modifications include:

Refactoring the microbatch slice creation logic from dp_utils.py to a new maybe_create_ubatch_slices function in ubatch_utils.py, which now provides both padded and unpadded slices.
Updating gpu_model_runner.py to conditionally use the correct slices for attention metadata based on the cudagraph_mode.
Simplifying coordinate_batch_across_dp to only decide whether to microbatch, not how to slice.
Fixing a subtle bug where a padded token count was incorrectly passed as an unpadded count.

The code is now more robust, and the refactoring improves the separation of concerns. The fix is sound, and I have no further recommendations.

gemini-code-assist

Code Review

This pull request addresses a bug related to Dual Batch Overlap (DBO) that caused an assertion failure (assert B_block_table == B_q) with certain cudagraph modes. The root cause was incorrect handling of padded microbatch slices, particularly for PIECEWISE and NONE cudagraphs where only token slices were being padded, leading to shape mismatches in attention metadata.

The fix is comprehensive and well-structured:

The logic for creating and padding microbatch slices has been refactored and centralized into a new maybe_create_ubatch_slices function in ubatch_utils.py. This function now correctly generates both padded and unpadded versions of the slices.
The padding logic itself has been corrected to pad both request and token slices, resolving the shape mismatch that caused the assertion.
Crucially, the gpu_model_runner now conditionally uses the padded slices for attention metadata construction only when using FULL cudagraphs, which is the correct behavior.
A related bug was also fixed where sequence-parallelism-padded token counts were incorrectly used to determine microbatching thresholds.

Overall, the changes are correct, improve code clarity through refactoring, and directly address the reported issue. The test plan confirms the fix. The code quality is high.

mergify · 2025-12-03T09:09:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @LucasWilkinson.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Lucas Wilkinson <[email protected]>

LucasWilkinson added ready ONLY add when PR is ready to merge/full CI is needed ready-run-all-tests Trigger CI with all tests for wide-ranging PRs labels Dec 3, 2025

mergify bot added the v1 label Dec 3, 2025

chatgpt-codex-connector bot reviewed Dec 3, 2025

View reviewed changes

gemini-code-assist bot reviewed Dec 3, 2025

View reviewed changes

mergify bot added the needs-rebase label Dec 3, 2025

LucasWilkinson requested review from benchislett and luccafong as code owners December 3, 2025 21:07

mergify bot added the speculative-decoding label Dec 3, 2025

LucasWilkinson added 10 commits December 3, 2025 14:29

wip

bba3d6c

Signed-off-by: Lucas Wilkinson <[email protected]>

fix

bca2d25

Signed-off-by: Lucas Wilkinson <[email protected]>

fix

0f06af5

Signed-off-by: Lucas Wilkinson <[email protected]>

clean

a3d6041

Signed-off-by: Lucas Wilkinson <[email protected]>

cleanup

3e1cb72

Signed-off-by: Lucas Wilkinson <[email protected]>

cleanup

69cf88a

Signed-off-by: Lucas Wilkinson <[email protected]>

fix

d71349e

Signed-off-by: Lucas Wilkinson <[email protected]>

fix unit test

e9ef62f

Signed-off-by: Lucas Wilkinson <[email protected]>

fix tests

0a010a6

Signed-off-by: Lucas Wilkinson <[email protected]>

wip fix spec decode

9f1b5e1

Signed-off-by: Lucas Wilkinson <[email protected]>

LucasWilkinson force-pushed the lwilkinson/dbo-fix branch from a8963a1 to 9f1b5e1 Compare December 3, 2025 22:30

mergify bot removed the needs-rebase label Dec 3, 2025

fix CI

aa4de2b

Signed-off-by: Lucas Wilkinson <[email protected]>

SageMoore approved these changes Dec 4, 2025

View reviewed changes

bnellnm mentioned this pull request Dec 4, 2025

[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply #29066

Open

5 tasks

robertgshaw2-redhat approved these changes Dec 4, 2025

View reviewed changes

robertgshaw2-redhat merged commit c8ab988 into vllm-project:main Dec 4, 2025
136 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Fix DBO assert `assert B_block_table == B_q` #29933

[BugFix] Fix DBO assert `assert B_block_table == B_q` #29933

Uh oh!

LucasWilkinson commented Dec 3, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 3, 2025

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mergify bot commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[BugFix] Fix DBO assert assert B_block_table == B_q #29933

[BugFix] Fix DBO assert assert B_block_table == B_q #29933

Uh oh!

Conversation

LucasWilkinson commented Dec 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[BugFix] Fix DBO assert `assert B_block_table == B_q` #29933

[BugFix] Fix DBO assert `assert B_block_table == B_q` #29933

LucasWilkinson commented Dec 3, 2025 •

edited by github-actions bot

Loading