Fix issue with async_scheduling when dealing with chunked input #360

tianmu-li · 2025-10-08T18:43:25Z

Cherry-pick of #359

Signed-off-by: Tianmu Li <[email protected]>

github-actions · 2025-10-08T18:43:35Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

Signed-off-by: Tianmu Li <[email protected]>

xuechendi · 2025-10-08T19:42:56Z

vllm_gaudi/v1/worker/hpu_model_runner.py

+                    if structured_output or self.use_async_scheduling:
                        logits_append = torch.tensor([torch.sum(prompt_len) - 1],
                                                     device=token_ids.device,
                                                     dtype=torch.int32)


I didn't get this part, why torch.sum(prompt_len) - 1 instead something like len(req_id) - logits_indices.shape[0]

Why logits_indices is shorter, because we skipped num_decodes right? Why padding is append after not before?

This is for chunked prompt, where it shouldn't generate a new token yet. In gpu_model_runner, a new token still gets generated for the incomplete prompt and then gets discarded. This is to align to that behavior.

This depends on the fact that there can only be one incomplete prompt, and that prompt is always the last one if it exists.

xuechendi · 2025-10-08T19:45:55Z

vllm_gaudi/v1/worker/hpu_model_runner.py

+                    if self.use_async_scheduling:
+                        # Discard partial prefill logit for async scheduling
                        # Depends on 1 decode token/batch
                        invalid_req_indices.append(num_decodes + idx)


maybe do something as, so it will be easier to understand?

prefill_start_idx = num_decodes invalid_req_indices.append(prefill_start_idx + idx)

Added clarification.

Signed-off-by: Tianmu Li <[email protected]>

michalkuligowski · 2025-10-09T07:12:25Z

/run-gaudi-tests

michalkuligowski · 2025-10-10T06:13:04Z

/run-gaudi-tests

github-actions · 2025-10-10T12:44:14Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
e39dc46f8fe61803032a5f51ba76f8fa03ba0b41

Fix issue with async_scheduling when dealing with chunked input

d715d59

Signed-off-by: Tianmu Li <[email protected]>

tianmu-li changed the title ~~[WIP] Fix issue with async_scheduling when dealing with chunked input~~ Fix issue with async_scheduling when dealing with chunked input Oct 8, 2025

tianmu-li marked this pull request as ready for review October 8, 2025 18:44

tianmu-li requested review from adobrzyn, afierka-intel, kzawora-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners October 8, 2025 18:44

xuechendi closed this Oct 8, 2025

xuechendi reopened this Oct 8, 2025

Dummy commit

f075944

Signed-off-by: Tianmu Li <[email protected]>

tianmu-li force-pushed the async_scheduling_chunk_fix_main branch from e2cc7ce to f075944 Compare October 8, 2025 19:34

xuechendi reviewed Oct 8, 2025

View reviewed changes

Clarify invalid_req_indices

9999809

Signed-off-by: Tianmu Li <[email protected]>

michalkuligowski approved these changes Oct 9, 2025

View reviewed changes

Merge branch 'main' into async_scheduling_chunk_fix_main

5a0658a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix issue with async_scheduling when dealing with chunked input #360

Fix issue with async_scheduling when dealing with chunked input #360

tianmu-li commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

xuechendi Oct 8, 2025

Uh oh!

xuechendi Oct 8, 2025

Uh oh!

tianmu-li Oct 8, 2025

Uh oh!

tianmu-li Oct 8, 2025

Uh oh!

xuechendi Oct 8, 2025

Uh oh!

tianmu-li Oct 8, 2025

Uh oh!

michalkuligowski commented Oct 9, 2025

Uh oh!

michalkuligowski commented Oct 10, 2025

Uh oh!

github-actions bot commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix issue with async_scheduling when dealing with chunked input #360

Are you sure you want to change the base?

Fix issue with async_scheduling when dealing with chunked input #360

Conversation

tianmu-li commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 8, 2025

🚧 CI Blocked

Uh oh!

xuechendi Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

xuechendi Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

tianmu-li Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

tianmu-li Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

xuechendi Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

tianmu-li Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

michalkuligowski commented Oct 9, 2025

Uh oh!

michalkuligowski commented Oct 10, 2025

Uh oh!

github-actions bot commented Oct 10, 2025

✅ CI Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants