Skip to content

Conversation

ckadner
Copy link
Collaborator

@ckadner ckadner commented Sep 5, 2025

Description

  • Add config file for models and runtime parameters
  • Use configuration file in documentation
  • Add validation code to compare requested model and runtime parameters with supported configurations
  • Log warning when requested configuration is not supported
    Example: WARNING 09-05 18:46:43 [runtime_config_validator.py:107] The requested configuration is not supported for model 'ibm-ai-platform/micro-g3.3-8b-instruct-1b': RuntimeConfiguration(platform=, cb=True, tp_size=1, max_model_len=128, max_num_seqs=2, num_blocks=0, warmup_shapes=None)

TODO:

  • code cleanup
  • review/revise model-config YAML file structure
  • add a YAML field to ignore testing models/configurations for tiny model unit tests
  • what to use for num_blocks (cpu, gpu ...override)?
  • revise config validation logic and messaging
    • 2 stage config matching ... top level fields first, set containment for warmup_shapes second
  • update configs after release (candidate) testing
  • remove option to error out on unknown configuration
  • how to match models by name if they are mounted
  • integrate model/runtime configurations into tests (⚗️ draft supported model tests #435)
  • get_warmup_shapes_from_envs() does not yield same as platform.py:cls._warmup_shapes

Review suggestions:

I wonder if it's feasible to test the warm up shapes like this. Maybe we could do something like:

  • in the know configuration file, [only keep the] upper bound
  • Validate that the prompts are multiples of 64
  • Validate that prompt + new_tokens <= max_model_len
  • Validate that the batch size is <= a tested upper bound.

Related Issues

#435

- Add config file for models and runtime parameters
- Add validation code to compare requested model and
  runtime parameters with supported configurations
- Log warning when requested configuration is not
  supported
- Use configuration file in documentation

Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
@ckadner ckadner marked this pull request as draft September 5, 2025 19:01
@vllm-project vllm-project deleted a comment from github-actions bot Sep 5, 2025
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
@ckadner ckadner marked this pull request as ready for review September 24, 2025 08:16
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
joerunde pushed a commit that referenced this pull request Sep 26, 2025
# Description

Document the supported models + configurations.

**Rendered preview:**

https://vllm-spyre--479.org.readthedocs.build/en/479/user_guide/supported_models.html#configurations


<details><summary>first stab</summary>

## Granite-3.3-8b-instruct (Precision: 16 bit)

**_Static Batching:_**

| Platform | AIUs | Prompt Length | New Tokens | Batch Size | Use Case |

|-----------|------|---------------|------------|------------|----------|
| `ppc64le` | 1 | 2048 | 1024 | 16 | EE |
| `ppc64le` | 4 | 6144 | 2048 | 1 | RAG |
| `s390x` | 4 | 7168 | 1024 | 1 | RAG |

**_Continuous Batching:_**

| Platform  | AIUs | Context Length | Batch Size | Comments         |
|-----------|------|----------------|------------|------------------|
| `amd64`   | 1    | 8192           | 4          |                  |
| `amd64`   | 2    | 8192           | 4          |                  |
| `amd64`   | 4    | 8192           | 4          |                  |
| `amd64`   | 4    | 16384          | 4          | `FLEX_DEVICE=PF` |
| `s390x`   | 1    | 8192           | 4          |                  |
| `s390x`   | 2    | 8192           | 4          |                  |
| `s390x`   | 4    | 8192           | 4          |                  |
| `s390x`   | 4    | 8192           | 4          | `FLEX_DEVICE=PF` |
| `s390x`   | 4    | 16384          | 4          | `FLEX_DEVICE=PF` |
| `s390x`   | 4    | 16384          | 4          | `FLEX_DEVICE=VF` |


## Granite-3.3-8b-instruct-FP8 (Precision: 8 bit)

**_Continuous Batching:_**

| Platform | AIUs | Context Length | Batch Size | FLEX Device |
|----------|------|----------------|------------|-------------|
| `amd64`  | 4    | 16384          | 4          | `PF`        |
| `s390x`  | 4    | 8192           | 4          | `PF`        |
| `s390x`  | 4    | 16384          | 4          | `PF`        |
| `s390x`  | 4    | 16384          | 4          | `VF`        |

</details>



<details><summary>second stab</summary>

## Configurations

The following models have been verified to run on vLLM Spyre with the
listed
configurations.

### Decoder Models

**_Static Batching:_**

| Model | Platform | AIUs | Prompt Length | New Tokens | Batch Size |
Edits |

|----------------|------------|------|---------------|------------|------------|---------|
| Granite-3.3-8b | `ppc64le` | 1 | 2048 | 1024 | 16 | remove? |
| Granite-3.3-8b | `ppc64le` | 4 | 6144 | 2048 | 1 | remove? |
| Granite-3.3-8b | `s390x` | 4 | 7168 | 1024 | 1 | remove? |

**_Continuous Batching:_**

| Model | Platform | AIUs | Context Length | Batch Size | Edits |

|----------------------|-----------|------|----------------|------------|---------|
| Granite-3.3-8b | `amd64` | 1 | 8192 | 4 | remove? |
| Granite-3.3-8b | `amd64` | 4 | 16384 | 4 | remove? |
| Granite-3.3-8b | `s390x` | 1 | 3072 | 16 | PELE |
| Granite-3.3-8b | `s390x` | 1 | 8192 | 4 | remove? |
| Granite-3.3-8b | `s390x` | 4 | 16384 | 4 | remove? |
| Granite-3.3-8b | `s390x` | 4 | 32768 | 32 | PELE |
| Granite-3.3-8b (FP8) | `amd64` | 4 | 16384 | 4 | remove? |
| Granite-3.3-8b (FP8) | `s390x` | 4 | 8192 | 4 | remove? |
| Granite-3.3-8b (FP8) | `s390x` | 4 | 16384 | 4 | remove? |
| Granite-3.3-8b (FP8) | `s390x` | 4 | 16384 | 4 | remove? |
| Granite-3.3-8b (FP8) | `ppc64le` | 1 | 3072 | 16 | remove? |
| Granite-3.3-8b (FP8) | `s390x` | 1 | 8192 | 16 | PELE |
| Granite-3.3-8b (FP8) | `s390x` | 1 | 32768 | 32 | PELE |
| Granite-3.3-8b (FP8) | `s390x` | 1 | 4096 | 32 | PELE |
| Granite-3.3-8b (FP8) | `s390x` | 1 | 32768 | 4 | PELE |
| Granite-3.3-8b (FP8) | `s390x` | 1 | 16384 | 8 | PELE |

### Encoder Models

| Model | Platform | AIUs | Context Length | Batch Size | Edits |

|-------------------------------------|-----------|------|----------------|------------|-------|
| Granite-Embedding-125m-English | `s390x` | 1 | 512 | 1 | PELE |
| Granite-Embedding-125m-English | `s390x` | 1 | 512 | 64 | PELE |
| granite-embedding-278m-multilingual | `s390x` | 1 | 512 | 1 | PELE |
| granite-embedding-278m-multilingual | `s390x` | 1 | 512 | 64 | PELE |
| BAAI/bge-reranker-v2-m3 | `s390x` | 1 | 2048 | 1 | PELE |
| BAAI/bge-reranker-v2-m3 | `s390x` | 1 | 4096 | 1 | PELE |
| BAAI/bge-reranker-v2-m3 | `s390x` | 1 | 8192 | 1 | PELE |
| BAAI/bge-reranker-large | `s390x` | 1 | 512 | 1 | PELE |
| BAAI/bge-reranker-large | `s390x` | 1 | 512 | 64 | PELE |

## Model Files

| Model | Download |

|-------------------------------------|------------------------------------------------------------------------------------|
| Granite-3.3-8b |
[Download](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) |
| Granite-3.3-8b (FP8) | |
| Granite-Embedding-125m-English |
[Download](https://huggingface.co/ibm-granite/granite-embedding-125m-english)
|
| Granite-Embedding-278m-Multilingual |
[Download](https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual)
|
| BAAI/bge-reranker-v2-m3 |
[Download](https://huggingface.co/BAAI/bge-reranker-v2-m3) |
| BAAI/bge-reranker-large |
[Download](https://huggingface.co/BAAI/bge-reranker-large) |

</details>

<details open><summary>final</summary>

<img width="514" height="835" alt="image"
src="https://github.com/user-attachments/assets/7ec1fd70-04f6-4586-873d-b531e4ef522a"
/>

</details>

## Related Issues

#445

---------

Signed-off-by: Christian Kadner <[email protected]>
@ckadner ckadner marked this pull request as draft September 30, 2025 19:48
Validate that the batch size is <= a tested upper bound

Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
@ckadner ckadner marked this pull request as ready for review September 30, 2025 21:26
@ckadner ckadner changed the title WIP: Manage supported model configurations Manage supported model configurations Sep 30, 2025
@ckadner
Copy link
Collaborator Author

ckadner commented Oct 6, 2025

Hi @maxdebayser I added validation code and unit tests for:

  • in the know configuration file, [only keep the] upper bound
  • test [requested warmup_shapes against] upper bound for prompt length, batch size and max_new tokens
  • sum of prompt + max_new_tokens is smaller than the max_model_len [of supported configs]
  • prompt size is a multiple of 64

Kindly take another look? Thank you! 🙏🏻

@maxdebayser
Copy link
Collaborator

@ckadner , I was having trouble to explain my thoughts as review comments, so I put them in code form: ckadner#19 .

@maxdebayser
Copy link
Collaborator

@ckadner , my assumptions aren't correct. Please disregard some of my previous comments about upper bounds.

Copy link
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a small suggestion, but otherwise it LGTM

@ckadner
Copy link
Collaborator Author

ckadner commented Oct 8, 2025

One last item to do:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants