test: update tests to use golden token injection #510

wallashss · 2025-10-07T20:27:04Z

Description

This PR is a follow up for #478

It updates the remaining tests with the function check_output_against_hf and replace by the new validate_vllm_vs_hf_output that does "everything": it generates for vllm and hf, where the results of hf are used as baselines for the golden token injection, afterwards it compares the results of both systems.

I also included these envs, to get more flexibility to setup the tests:

VLLM_SPYRE_TEST_DISABLE_GOLDEN_TOKEN - to disable the golden token injection
VLLM_SPYRE_TEST_ABS_TOL - override the constant ISCLOSE_ABS_TOL
VLLM_SPYRE_TEST_QUANTIZED_ABS_TOL override the constant ISCLOSE_ABS_TOL_QUANTIZATION

Signed-off-by: Wallas Santos <[email protected]>

…-gti-tests

github-actions · 2025-10-07T20:27:16Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

wallashss · 2025-10-07T20:27:21Z

bot:test

Signed-off-by: Wallas Santos <[email protected]>

…-gti-tests Signed-off-by: Wallas Santos <[email protected]>

Signed-off-by: Wallas Santos <[email protected]>

joerunde · 2025-10-08T20:09:52Z

tests/e2e/test_spyre_cb_scheduler_steps.py

-    seqs_max_tokens = [10, 13, 2]
-    prompts_lengths = [49, 41, 5]
+    seqs_max_tokens = [33, 36, 2]
+    prompts_lengths = [49, 41, 32]


Why do these values need to be updated? I think we had tried to keep these pretty small because when running on cpu it takes quite a while to generate tokens. The cb tests used to take almost 20 minutes on github actions

oops... that's a good question.

Because of this optimization: #262

Specifically this line:

vllm-spyre/vllm_spyre/v1/worker/spyre_model_runner.py

Line 982 in 1c11f68

n_fully_padded_blocks = math.floor(

We have to use longer prompts and move forward tkv to match the expectations of num of blocks. That's the only way that found to update this test.

But... maybe I can revert the test of this file. We won't run with full model anyway. What do you think?

I did a quick benchmark to verify the impact in tests as suggested by @maxdebayser :

I ran in a dev pod:

python -m pytest -vs --disable-warnings --forked tests/e2e/test_spyre_cb_scheduler_steps.py -m cpu

Original

========== 34 passed, 68 deselected, 72 warnings in 720.32s (0:12:00) ========== ========== 34 passed, 68 deselected, 72 warnings in 584.42s (0:09:44) ========== ========== 34 passed, 68 deselected, 72 warnings in 516.96s (0:08:36) ==========

Updated

========== 34 passed, 68 deselected, 72 warnings in 538.94s (0:08:58) ========== ========== 34 passed, 68 deselected, 72 warnings in 541.26s (0:09:01) ========== ========== 34 passed, 68 deselected, 72 warnings in 544.73s (0:09:04) ==========

Interesting that there was a run that long more than expected, but looks like it was an outlier run.

I'm still very confused 😕

Are you saying that

these tests need to be updated to properly test that optimization, or

that they need to be updated in order to add the golden token injection?

If (1) then we should separate that change into a different PR, if (2) then I don't get why adding GTI would cause the smaller sequences to fail

ah, nice benchmark! For -m cpu we should run without --forked, but even then the cpus on GHA are way slower than the ones on our clusters so we won't really see comparable results

Oh but these tests aren't even using golden token injection? Because we're not using validate_vllm_vs_hf_output?

I'm still very confused 😕

Are you saying that

these tests need to be updated to properly test that optimization, or

that they need to be updated in order to add the golden token injection?

If (1) then we should separate that change into a different PR, if (2) then I don't get why adding GTI would cause the smaller sequences to fail

Sorry for that, I think it is more for (1). I needed to update them to have more context so we could have less error with GTI when using quantized full model.

Oh but these tests aren't even using golden token injection? Because we're not using validate_vllm_vs_hf_output?

They don't use this new method, they are already using GTI (it is already on main), since they generate using the engine directly to get information step by step , they won't be compatible with validate_vllm_vs_hf_output that will use the LLM class to generate.

Since, there's a lot of caveat in this class, I am thinking in revert this class (which already has GTI) and we can plan better later to actually get more benefits of the GTI.

ah, gotcha. Thanks for the detailed explanation!

Signed-off-by: Wallas Santos <[email protected]>

…-gti-tests

Signed-off-by: Wallas Santos <[email protected]>

wallashss · 2025-10-10T00:21:45Z

bot:test

Signed-off-by: Wallas Santos <[email protected]>

wallashss added 4 commits October 7, 2025 10:55

feat: update cb scheduler steps

eeeb6d7

Signed-off-by: Wallas Santos <[email protected]>

test: update test spyre basic

6ec36d6

Signed-off-by: Wallas Santos <[email protected]>

test: update tests to use golden token injection

c8c85c0

Signed-off-by: Wallas Santos <[email protected]>

Merge branch 'main' of github.com:vllm-project/vllm-spyre into wallas…

32b7c90

…-gti-tests

wallashss requested review from rafvasq, prashantgupta24 and sducouedic as code owners October 7, 2025 20:27

wallashss added 4 commits October 8, 2025 13:29

fix: default env value

4e2c90a

Signed-off-by: Wallas Santos <[email protected]>

Merge branch 'main' of github.com:vllm-project/vllm-spyre into wallas…

63f5e2c

…-gti-tests Signed-off-by: Wallas Santos <[email protected]>

feat: update cache

94f6bb2

Signed-off-by: Wallas Santos <[email protected]>

fix: update hf_cache.json

20eb7cc

Signed-off-by: Wallas Santos <[email protected]>

wallashss force-pushed the wallas-gti-tests branch from 58dc800 to 20eb7cc Compare October 8, 2025 20:07

joerunde reviewed Oct 8, 2025

View reviewed changes

wallashss added 3 commits October 8, 2025 17:36

feat: updated cache

770b3c3

Signed-off-by: Wallas Santos <[email protected]>

Merge branch 'main' of github.com:vllm-project/vllm-spyre into wallas…

f120f90

…-gti-tests

feat: reverted tests/e2e/test_spyre_cb_scheduler_steps.py

67cc907

Signed-off-by: Wallas Santos <[email protected]>

wallashss requested a review from joerunde October 10, 2025 00:21

feat: revert hf_cache.json

973c742

Signed-off-by: Wallas Santos <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: update tests to use golden token injection #510

test: update tests to use golden token injection #510

Uh oh!

wallashss commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

wallashss commented Oct 7, 2025

Uh oh!

joerunde Oct 8, 2025

Uh oh!

wallashss Oct 8, 2025

Uh oh!

wallashss Oct 8, 2025

Uh oh!

wallashss Oct 9, 2025

Uh oh!

joerunde Oct 9, 2025

Uh oh!

joerunde Oct 9, 2025

Uh oh!

joerunde Oct 9, 2025

Uh oh!

wallashss Oct 9, 2025

Uh oh!

joerunde Oct 9, 2025

Uh oh!

wallashss commented Oct 10, 2025

Uh oh!

Uh oh!

test: update tests to use golden token injection #510

Are you sure you want to change the base?

test: update tests to use golden token injection #510

Uh oh!

Conversation

wallashss commented Oct 7, 2025

Description

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

wallashss commented Oct 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wallashss commented Oct 10, 2025

Uh oh!

Uh oh!