[CB] fix scheduler constraint max tokens #332

sducouedic · 2025-07-24T12:25:36Z

The continuous batching scheduler checks that the request (ie. prompt + requested tokens) would fit the max_context_len, but didn't account for the fact that the last generated token is a "free" one that doesn't need to be stored in a block

Signed-off-by: Sophie du Couédic <[email protected]>

github-actions · 2025-07-24T12:34:41Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

yannicks1 · 2025-07-24T12:35:47Z

related to #320

@sducouedic

We had a (known) inconsistency: Allowing the additional token in `platform.validate_request`, but not in `scheduler.can_schedule()` (#332 would allow it in the scheduler). However, as the upstream vllm version vllm-spyre is tied to does not currently allow the additional token, this PR removes the inconsistency and establishes same behavior as upstream. Note: as soon as the upstream version tied to this repo allows the additional token, we can revert this PR and merge #332 (along with some unit tests). cc @sducouedic Signed-off-by: Yannick Schnider <[email protected]>

fix scheduler free prefix token

a1cd3d4

Signed-off-by: Sophie du Couédic <[email protected]>

sducouedic requested review from nikolaospapandreou, tdoublep and yannicks1 as code owners July 24, 2025 12:25

yannicks1 marked this pull request as draft July 24, 2025 12:35

sducouedic changed the title ~~[CB] fix scheduler free last token~~ [CB] fix scheduler constraint max tokens Jul 25, 2025

yannicks1 mentioned this pull request Oct 9, 2025

[CB] consistent max context length #514

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CB] fix scheduler constraint max tokens #332

[CB] fix scheduler constraint max tokens #332

Uh oh!

sducouedic commented Jul 24, 2025

Uh oh!

github-actions bot commented Jul 24, 2025

Uh oh!

yannicks1 commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CB] fix scheduler constraint max tokens #332

Are you sure you want to change the base?

[CB] fix scheduler constraint max tokens #332

Uh oh!

Conversation

sducouedic commented Jul 24, 2025

Uh oh!

github-actions bot commented Jul 24, 2025

Uh oh!

yannicks1 commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants