[BugFix][Core] Fix error when enable async-scheduling in multi-node env #25887

lhtin · 2025-09-29T13:19:39Z

When launching in a multi-node environment (e.g., TP16), the ParallelConfig automatically selects ray as the distributed_executor_backend. However, when async scheduling is enabled, it prematurely sets the default value of distributed_executor_backendto to mp, causing a launch failure like bellow. This fix moves the check to after that the backend is auto-selected.

Currently, async scheduling (primarily the fully overlap feature) does not support Ray as a backend(error like bellow). Support for this can be added in a future PR.

gemini-code-assist

Code Review

This pull request aims to fix a launch failure for async scheduling in a multi-node environment by adjusting when the distributed executor backend is configured. The change correctly removes the premature default setting of the backend to mp. However, the new validation logic for supported backends with async scheduling seems to have some inconsistencies. I've added a comment with a suggestion to clarify this logic and make it consistent with the information provided in the pull request description.

vllm/engine/arg_utils.py

lhtin · 2025-09-29T13:24:32Z

@WoosukKwon @benchislett Hello, could you please review this MR?

mergify · 2025-10-11T06:39:10Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @lhtin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

lhtin · 2025-10-11T06:50:15Z

@WoosukKwon @benchislett Hi, could you take a look at this PR?

vllm/engine/arg_utils.py

benchislett

LGTM, one grammar nit

…he default selection. Signed-off-by: Lehua Ding <[email protected]>

Co-authored-by: Benjamin Chislett <[email protected]> Signed-off-by: Lehua Ding <[email protected]>

Signed-off-by: Lehua Ding <[email protected]>

…nv (vllm-project#25887) Signed-off-by: Lehua Ding <[email protected]> Signed-off-by: Lehua Ding <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]>

…nv (vllm-project#25887) Signed-off-by: Lehua Ding <[email protected]> Signed-off-by: Lehua Ding <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>

…nv (vllm-project#25887) Signed-off-by: Lehua Ding <[email protected]> Signed-off-by: Lehua Ding <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]>

…nv (vllm-project#25887) Signed-off-by: Lehua Ding <[email protected]> Signed-off-by: Lehua Ding <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…nv (vllm-project#25887) Signed-off-by: Lehua Ding <[email protected]> Signed-off-by: Lehua Ding <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…nv (vllm-project#25887) Signed-off-by: Lehua Ding <[email protected]> Signed-off-by: Lehua Ding <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]>

gemini-code-assist bot reviewed Sep 29, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

lhtin force-pushed the fix-async-scheduling-with-ray branch from 2a3e01f to 5f213a1 Compare September 29, 2025 13:25

mergify bot added the needs-rebase label Oct 11, 2025

lhtin force-pushed the fix-async-scheduling-with-ray branch from 5f213a1 to 5aa3adc Compare October 11, 2025 06:44

mergify bot removed the needs-rebase label Oct 11, 2025

lhtin changed the title ~~[Async Scheduling] Fix error when enable async-scheduling in multi-node env~~ [BugFix] Fix error when enable async-scheduling in multi-node env Oct 16, 2025

lhtin changed the title ~~[BugFix] Fix error when enable async-scheduling in multi-node env~~ [BugFix][Core] Fix error when enable async-scheduling in multi-node env Oct 16, 2025

lhtin mentioned this pull request Oct 16, 2025

[Perf][V1] Fully overlap model execution #23569

Merged

benchislett reviewed Oct 16, 2025

View reviewed changes

vllm/engine/arg_utils.py Show resolved Hide resolved

benchislett reviewed Oct 16, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

benchislett approved these changes Oct 16, 2025

View reviewed changes

lhtin force-pushed the fix-async-scheduling-with-ray branch from f4bb3bd to 54e56a5 Compare October 17, 2025 02:25

lhtin and others added 3 commits October 17, 2025 17:56

[Async Scheduling] Move the auto-selection of the backend to before t…

c3f0e20

…he default selection. Signed-off-by: Lehua Ding <[email protected]>

Update vllm/engine/arg_utils.py

64b4325

Co-authored-by: Benjamin Chislett <[email protected]> Signed-off-by: Lehua Ding <[email protected]>

Address format error

bc67b60

Signed-off-by: Lehua Ding <[email protected]>

lhtin force-pushed the fix-async-scheduling-with-ray branch from 54e56a5 to bc67b60 Compare October 17, 2025 09:56

benchislett approved these changes Oct 17, 2025

View reviewed changes

benchislett added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels Oct 17, 2025

njhill enabled auto-merge (squash) October 17, 2025 20:52

njhill merged commit 6367bde into vllm-project:main Oct 17, 2025
50 of 51 checks passed

lhtin mentioned this pull request Nov 19, 2025

Support async scheduling with ray backend #29012

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix][Core] Fix error when enable async-scheduling in multi-node env #25887

[BugFix][Core] Fix error when enable async-scheduling in multi-node env #25887

Uh oh!

lhtin commented Sep 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

lhtin commented Sep 29, 2025

Uh oh!

mergify bot commented Oct 11, 2025

Uh oh!

lhtin commented Oct 11, 2025

Uh oh!

Uh oh!

Uh oh!

benchislett left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[BugFix][Core] Fix error when enable async-scheduling in multi-node env #25887

[BugFix][Core] Fix error when enable async-scheduling in multi-node env #25887

Uh oh!

Conversation

lhtin commented Sep 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

lhtin commented Sep 29, 2025

Uh oh!

mergify bot commented Oct 11, 2025

Uh oh!

lhtin commented Oct 11, 2025

Uh oh!

Uh oh!

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lhtin commented Sep 29, 2025 •

edited by github-actions bot

Loading