Manage supported model configurations #445

ckadner · 2025-09-05T17:50:02Z

Description

Add config file for models and runtime parameters
Use configuration file in documentation
Add validation code to compare requested model and runtime parameters with supported configurations
Log warning when requested configuration is not supported
Example:
WARNING 09-05 18:46:43 [runtime_config_validator.py:107] The requested configuration is not supported for model 'ibm-ai-platform/micro-g3.3-8b-instruct-1b': RuntimeConfiguration(platform=, cb=True, tp_size=1, max_model_len=128, max_num_seqs=2, num_blocks=0, warmup_shapes=None)

TODO:

code cleanup
review/revise model-config YAML file structure
add a YAML field to ignore testing models/configurations for tiny model unit tests
what to use for num_blocks (cpu, gpu ...override)?
revise config validation logic and messaging
- 2 stage config matching ... top level fields first, set containment for warmup_shapes second
update configs after release (candidate) testing
remove option to error out on unknown configuration
how to match models by name if they are mounted
integrate model/runtime configurations into tests (⚗️ draft supported model tests #435)
❓ get_warmup_shapes_from_envs() does not yield same as platform.py:cls._warmup_shapes

Review suggestions:

@maxdebayser

I wonder if it's feasible to test the warm up shapes like this. Maybe we could do something like:

in the know configuration file, [only keep the] upper bound

Validate that the prompts are multiples of 64

Validate that prompt + new_tokens <= max_model_len

Validate that the batch size is <= a tested upper bound.

Related Issues

#435

- Add config file for models and runtime parameters - Add validation code to compare requested model and runtime parameters with supported configurations - Log warning when requested configuration is not supported - Use configuration file in documentation Signed-off-by: Christian Kadner <[email protected]>

Signed-off-by: Christian Kadner <[email protected]>

vllm_spyre/config/supported_configurations.yaml

vllm_spyre/config/runtime_config_validator.py

Signed-off-by: Christian Kadner <[email protected]>

# Description Document the supported models + configurations. **Rendered preview:** https://vllm-spyre--479.org.readthedocs.build/en/479/user_guide/supported_models.html#configurations <details><summary>first stab</summary> ## Granite-3.3-8b-instruct (Precision: 16 bit) **_Static Batching:_** | Platform | AIUs | Prompt Length | New Tokens | Batch Size | Use Case | |-----------|------|---------------|------------|------------|----------| | `ppc64le` | 1 | 2048 | 1024 | 16 | EE | | `ppc64le` | 4 | 6144 | 2048 | 1 | RAG | | `s390x` | 4 | 7168 | 1024 | 1 | RAG | **_Continuous Batching:_** | Platform | AIUs | Context Length | Batch Size | Comments | |-----------|------|----------------|------------|------------------| | `amd64` | 1 | 8192 | 4 | | | `amd64` | 2 | 8192 | 4 | | | `amd64` | 4 | 8192 | 4 | | | `amd64` | 4 | 16384 | 4 | `FLEX_DEVICE=PF` | | `s390x` | 1 | 8192 | 4 | | | `s390x` | 2 | 8192 | 4 | | | `s390x` | 4 | 8192 | 4 | | | `s390x` | 4 | 8192 | 4 | `FLEX_DEVICE=PF` | | `s390x` | 4 | 16384 | 4 | `FLEX_DEVICE=PF` | | `s390x` | 4 | 16384 | 4 | `FLEX_DEVICE=VF` | ## Granite-3.3-8b-instruct-FP8 (Precision: 8 bit) **_Continuous Batching:_** | Platform | AIUs | Context Length | Batch Size | FLEX Device | |----------|------|----------------|------------|-------------| | `amd64` | 4 | 16384 | 4 | `PF` | | `s390x` | 4 | 8192 | 4 | `PF` | | `s390x` | 4 | 16384 | 4 | `PF` | | `s390x` | 4 | 16384 | 4 | `VF` | </details> <details><summary>second stab</summary> ## Configurations The following models have been verified to run on vLLM Spyre with the listed configurations. ### Decoder Models **_Static Batching:_** | Model | Platform | AIUs | Prompt Length | New Tokens | Batch Size | Edits | |----------------|------------|------|---------------|------------|------------|---------| | Granite-3.3-8b | `ppc64le` | 1 | 2048 | 1024 | 16 | remove? | | Granite-3.3-8b | `ppc64le` | 4 | 6144 | 2048 | 1 | remove? | | Granite-3.3-8b | `s390x` | 4 | 7168 | 1024 | 1 | remove? | **_Continuous Batching:_** | Model | Platform | AIUs | Context Length | Batch Size | Edits | |----------------------|-----------|------|----------------|------------|---------| | Granite-3.3-8b | `amd64` | 1 | 8192 | 4 | remove? | | Granite-3.3-8b | `amd64` | 4 | 16384 | 4 | remove? | | Granite-3.3-8b | `s390x` | 1 | 3072 | 16 | PELE | | Granite-3.3-8b | `s390x` | 1 | 8192 | 4 | remove? | | Granite-3.3-8b | `s390x` | 4 | 16384 | 4 | remove? | | Granite-3.3-8b | `s390x` | 4 | 32768 | 32 | PELE | | Granite-3.3-8b (FP8) | `amd64` | 4 | 16384 | 4 | remove? | | Granite-3.3-8b (FP8) | `s390x` | 4 | 8192 | 4 | remove? | | Granite-3.3-8b (FP8) | `s390x` | 4 | 16384 | 4 | remove? | | Granite-3.3-8b (FP8) | `s390x` | 4 | 16384 | 4 | remove? | | Granite-3.3-8b (FP8) | `ppc64le` | 1 | 3072 | 16 | remove? | | Granite-3.3-8b (FP8) | `s390x` | 1 | 8192 | 16 | PELE | | Granite-3.3-8b (FP8) | `s390x` | 1 | 32768 | 32 | PELE | | Granite-3.3-8b (FP8) | `s390x` | 1 | 4096 | 32 | PELE | | Granite-3.3-8b (FP8) | `s390x` | 1 | 32768 | 4 | PELE | | Granite-3.3-8b (FP8) | `s390x` | 1 | 16384 | 8 | PELE | ### Encoder Models | Model | Platform | AIUs | Context Length | Batch Size | Edits | |-------------------------------------|-----------|------|----------------|------------|-------| | Granite-Embedding-125m-English | `s390x` | 1 | 512 | 1 | PELE | | Granite-Embedding-125m-English | `s390x` | 1 | 512 | 64 | PELE | | granite-embedding-278m-multilingual | `s390x` | 1 | 512 | 1 | PELE | | granite-embedding-278m-multilingual | `s390x` | 1 | 512 | 64 | PELE | | BAAI/bge-reranker-v2-m3 | `s390x` | 1 | 2048 | 1 | PELE | | BAAI/bge-reranker-v2-m3 | `s390x` | 1 | 4096 | 1 | PELE | | BAAI/bge-reranker-v2-m3 | `s390x` | 1 | 8192 | 1 | PELE | | BAAI/bge-reranker-large | `s390x` | 1 | 512 | 1 | PELE | | BAAI/bge-reranker-large | `s390x` | 1 | 512 | 64 | PELE | ## Model Files | Model | Download | |-------------------------------------|------------------------------------------------------------------------------------| | Granite-3.3-8b | [Download](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) | | Granite-3.3-8b (FP8) | | | Granite-Embedding-125m-English | [Download](https://huggingface.co/ibm-granite/granite-embedding-125m-english) | | Granite-Embedding-278m-Multilingual | [Download](https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual) | | BAAI/bge-reranker-v2-m3 | [Download](https://huggingface.co/BAAI/bge-reranker-v2-m3) | | BAAI/bge-reranker-large | [Download](https://huggingface.co/BAAI/bge-reranker-large) | </details> <details open><summary>final</summary> <img width="514" height="835" alt="image" src="https://github.com/user-attachments/assets/7ec1fd70-04f6-4586-873d-b531e4ef522a" /> </details> ## Related Issues #445 --------- Signed-off-by: Christian Kadner <[email protected]>

Signed-off-by: Christian Kadner <[email protected]>

Validate that the batch size is <= a tested upper bound Signed-off-by: Christian Kadner <[email protected]>

Signed-off-by: Christian Kadner <[email protected]>

tests/utils/test_model_config_validator.py

Signed-off-by: Christian Kadner <[email protected]>

vllm_spyre/config/runtime_config_validator.py

Signed-off-by: Christian Kadner <[email protected]>

ckadner · 2025-10-06T19:39:19Z

Hi @maxdebayser I added validation code and unit tests for:

in the know configuration file, [only keep the] upper bound

test [requested warmup_shapes against] upper bound for prompt length, batch size and max_new tokens

sum of prompt + max_new_tokens is smaller than the max_model_len [of supported configs]

prompt size is a multiple of 64

Kindly take another look? Thank you! 🙏🏻

maxdebayser · 2025-10-07T17:05:49Z

@ckadner , I was having trouble to explain my thoughts as review comments, so I put them in code form: ckadner#19 .

maxdebayser · 2025-10-07T17:40:55Z

@ckadner , my assumptions aren't correct. Please disregard some of my previous comments about upper bounds.

vllm_spyre/config/runtime_config_validator.py

maxdebayser

I've left a small suggestion, but otherwise it LGTM

ckadner · 2025-10-08T16:46:37Z

One last item to do:

Detect the model similar to what is done for granite, not rely on name which works for download from HF but not for file mounts (see PR 🐛 implement better checking for granite #500)

ckadner added 6 commits September 5, 2025 09:15

Reorganize import statements

1b80333

Signed-off-by: Christian Kadner <[email protected]>

Lint docs

3d8ad48

Signed-off-by: Christian Kadner <[email protected]>

use 'x86_64' instead of 'amd64'

c8ce7da

Signed-off-by: Christian Kadner <[email protected]>

typecheck updates

4decb48

Signed-off-by: Christian Kadner <[email protected]>

more typecheck updates

09bde9e

Signed-off-by: Christian Kadner <[email protected]>

ckadner requested a review from joerunde September 5, 2025 17:50

ckadner requested review from nikolaospapandreou, rafvasq, sducouedic, tdoublep and yannicks1 as code owners September 5, 2025 17:50

ckadner added 6 commits September 5, 2025 11:10

run isort with suggested changes

564d9b0

Signed-off-by: Christian Kadner <[email protected]>

reorganize imports as isort wants them

4223ddf

Signed-off-by: Christian Kadner <[email protected]>

CI: isort show suggested import changes

dd6bd49

Signed-off-by: Christian Kadner <[email protected]>

update comments in config YAML

ca46ba8

Signed-off-by: Christian Kadner <[email protected]>

yapf

214a5ce

Signed-off-by: Christian Kadner <[email protected]>

run type-check with Python 3.10 by default

d67d7b1

Signed-off-by: Christian Kadner <[email protected]>

ckadner requested a review from prashantgupta24 as a code owner September 5, 2025 18:44

ckadner marked this pull request as draft September 5, 2025 19:01

vllm-project deleted a comment from github-actions bot Sep 5, 2025

ckadner added 3 commits September 5, 2025 14:55

revert unrelated changes

03c9c76

Signed-off-by: Christian Kadner <[email protected]>

Merge branch 'main' into model_configs

5a67e9a

Merge branch 'main' into model_configs

6995cad

Signed-off-by: Christian Kadner <[email protected]>

joerunde reviewed Sep 15, 2025

View reviewed changes

vllm_spyre/config/supported_configurations.yaml Outdated Show resolved Hide resolved

joerunde reviewed Sep 15, 2025

View reviewed changes

vllm_spyre/config/supported_configurations.yaml Outdated Show resolved Hide resolved

joerunde reviewed Sep 15, 2025

View reviewed changes

vllm_spyre/config/runtime_config_validator.py Outdated Show resolved Hide resolved

joerunde reviewed Sep 15, 2025

View reviewed changes

vllm_spyre/config/runtime_config_validator.py Outdated Show resolved Hide resolved

maxdebayser reviewed Sep 17, 2025

View reviewed changes

vllm_spyre/config/runtime_config_validator.py Outdated Show resolved Hide resolved

address review comments, add tests

12bb213

Signed-off-by: Christian Kadner <[email protected]>

ckadner added 2 commits September 24, 2025 00:27

yapf is ruff

de4544e

Signed-off-by: Christian Kadner <[email protected]>

type-check

dca59ba

Signed-off-by: Christian Kadner <[email protected]>

ckadner marked this pull request as ready for review September 24, 2025 08:16

ckadner added 2 commits September 24, 2025 17:12

update supported configs

8a1205b

Signed-off-by: Christian Kadner <[email protected]>

update supported parameters

763a112

Signed-off-by: Christian Kadner <[email protected]>

ckadner mentioned this pull request Sep 25, 2025

Document supported model configurations #479

Merged

ckadner added 4 commits September 29, 2025 16:29

Merge branch 'main' into model_configs

b2f8649

Signed-off-by: Christian Kadner <[email protected]>

assert c.warmup_shapes is None if use_cb

cbb7a1b

Signed-off-by: Christian Kadner <[email protected]>

update list of supported models

cc0a393

Signed-off-by: Christian Kadner <[email protected]>

requested config <= supported config

3f48a91

Signed-off-by: Christian Kadner <[email protected]>

ckadner marked this pull request as draft September 30, 2025 19:48

ckadner added 2 commits September 30, 2025 13:55

Validate that prompt + new_tokens <= max_model_len

c65aa9e

Validate that the batch size is <= a tested upper bound Signed-off-by: Christian Kadner <[email protected]>

type-check

437290a

Signed-off-by: Christian Kadner <[email protected]>

ckadner marked this pull request as ready for review September 30, 2025 21:26

ckadner changed the title ~~WIP: Manage supported model configurations~~ Manage supported model configurations Sep 30, 2025

joerunde reviewed Sep 30, 2025

View reviewed changes

tests/utils/test_model_config_validator.py Show resolved Hide resolved

ckadner requested review from joerunde and maxdebayser September 30, 2025 21:31

remove option to error out on unsupported/unknown configuration

4f7a804

Signed-off-by: Christian Kadner <[email protected]>

rafvasq reviewed Oct 3, 2025

View reviewed changes

vllm_spyre/config/runtime_config_validator.py Outdated Show resolved Hide resolved

vllm_spyre/config/runtime_config_validator.py Outdated Show resolved Hide resolved

vllm_spyre/config/runtime_config_validator.py Show resolved Hide resolved

ckadner added 2 commits October 3, 2025 20:01

remove configurations that are within the upper bound of another config

3970d84

Signed-off-by: Christian Kadner <[email protected]>

verify config parameters adhere to restrictions

05405f4

Signed-off-by: Christian Kadner <[email protected]>

maxdebayser reviewed Oct 7, 2025

View reviewed changes

vllm_spyre/config/runtime_config_validator.py Show resolved Hide resolved

maxdebayser approved these changes Oct 7, 2025

View reviewed changes

Merge branch 'main' into model_configs

82ccf41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Manage supported model configurations #445

Manage supported model configurations #445

Uh oh!

ckadner commented Sep 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ckadner commented Oct 6, 2025 •

edited

Loading

Uh oh!

maxdebayser commented Oct 7, 2025

Uh oh!

maxdebayser commented Oct 7, 2025

Uh oh!

Uh oh!

maxdebayser left a comment

Uh oh!

ckadner commented Oct 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Manage supported model configurations #445

Are you sure you want to change the base?

Manage supported model configurations #445

Uh oh!

Conversation

ckadner commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ckadner commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxdebayser commented Oct 7, 2025

Uh oh!

maxdebayser commented Oct 7, 2025

Uh oh!

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

ckadner commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ckadner commented Sep 5, 2025 •

edited

Loading

ckadner commented Oct 6, 2025 •

edited

Loading

ckadner commented Oct 8, 2025 •

edited

Loading