Skip to content

Conversation

@lhtin
Copy link
Contributor

@lhtin lhtin commented Sep 29, 2025

When launching in a multi-node environment (e.g., TP16), the ParallelConfig automatically selects ray as the distributed_executor_backend. However, when async scheduling is enabled, it prematurely sets the default value of distributed_executor_backendto to mp, causing a launch failure like bellow. This fix moves the check to after that the backend is auto-selected.

image

Currently, async scheduling (primarily the fully overlap feature) does not support Ray as a backend(error like bellow). Support for this can be added in a future PR.

image

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a launch failure for async scheduling in a multi-node environment by adjusting when the distributed executor backend is configured. The change correctly removes the premature default setting of the backend to mp. However, the new validation logic for supported backends with async scheduling seems to have some inconsistencies. I've added a comment with a suggestion to clarify this logic and make it consistent with the information provided in the pull request description.

@lhtin
Copy link
Contributor Author

lhtin commented Sep 29, 2025

@WoosukKwon @benchislett Hello, could you please review this MR?

@lhtin lhtin force-pushed the fix-async-scheduling-with-ray branch from 2a3e01f to 5f213a1 Compare September 29, 2025 13:25
@mergify
Copy link

mergify bot commented Oct 11, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @lhtin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 11, 2025
@lhtin lhtin force-pushed the fix-async-scheduling-with-ray branch from 5f213a1 to 5aa3adc Compare October 11, 2025 06:44
@mergify mergify bot removed the needs-rebase label Oct 11, 2025
@lhtin
Copy link
Contributor Author

lhtin commented Oct 11, 2025

@WoosukKwon @benchislett Hi, could you take a look at this PR?

@lhtin lhtin changed the title [Async Scheduling] Fix error when enable async-scheduling in multi-node env [BugFix] Fix error when enable async-scheduling in multi-node env Oct 16, 2025
@lhtin lhtin changed the title [BugFix] Fix error when enable async-scheduling in multi-node env [BugFix][Core] Fix error when enable async-scheduling in multi-node env Oct 16, 2025
Copy link
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one grammar nit

@lhtin lhtin force-pushed the fix-async-scheduling-with-ray branch from f4bb3bd to 54e56a5 Compare October 17, 2025 02:25
lhtin and others added 3 commits October 17, 2025 17:56
Co-authored-by: Benjamin Chislett <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
@lhtin lhtin force-pushed the fix-async-scheduling-with-ray branch from 54e56a5 to bc67b60 Compare October 17, 2025 09:56
@benchislett benchislett added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels Oct 17, 2025
@njhill njhill enabled auto-merge (squash) October 17, 2025 20:52
@njhill njhill merged commit 6367bde into vllm-project:main Oct 17, 2025
50 of 51 checks passed
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…nv (vllm-project#25887)

Signed-off-by: Lehua Ding <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
adabeyta pushed a commit to adabeyta/vllm that referenced this pull request Oct 20, 2025
…nv (vllm-project#25887)

Signed-off-by: Lehua Ding <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
…nv (vllm-project#25887)

Signed-off-by: Lehua Ding <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
Signed-off-by: Alberto Perdomo <[email protected]>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…nv (vllm-project#25887)

Signed-off-by: Lehua Ding <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…nv (vllm-project#25887)

Signed-off-by: Lehua Ding <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…nv (vllm-project#25887)

Signed-off-by: Lehua Ding <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…nv (vllm-project#25887)

Signed-off-by: Lehua Ding <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
Signed-off-by: 0xrushi <[email protected]>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…nv (vllm-project#25887)

Signed-off-by: Lehua Ding <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
Signed-off-by: 0xrushi <[email protected]>
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
…nv (vllm-project#25887)

Signed-off-by: Lehua Ding <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…nv (vllm-project#25887)

Signed-off-by: Lehua Ding <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025
…nv (vllm-project#25887)

Signed-off-by: Lehua Ding <[email protected]>
Signed-off-by: Lehua Ding <[email protected]>
Co-authored-by: Benjamin Chislett <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants