-
Notifications
You must be signed in to change notification settings - Fork 26
Manage supported model configurations #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add config file for models and runtime parameters - Add validation code to compare requested model and runtime parameters with supported configurations - Log warning when requested configuration is not supported - Use configuration file in documentation Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
# Description Document the supported models + configurations. **Rendered preview:** https://vllm-spyre--479.org.readthedocs.build/en/479/user_guide/supported_models.html#configurations <details><summary>first stab</summary> ## Granite-3.3-8b-instruct (Precision: 16 bit) **_Static Batching:_** | Platform | AIUs | Prompt Length | New Tokens | Batch Size | Use Case | |-----------|------|---------------|------------|------------|----------| | `ppc64le` | 1 | 2048 | 1024 | 16 | EE | | `ppc64le` | 4 | 6144 | 2048 | 1 | RAG | | `s390x` | 4 | 7168 | 1024 | 1 | RAG | **_Continuous Batching:_** | Platform | AIUs | Context Length | Batch Size | Comments | |-----------|------|----------------|------------|------------------| | `amd64` | 1 | 8192 | 4 | | | `amd64` | 2 | 8192 | 4 | | | `amd64` | 4 | 8192 | 4 | | | `amd64` | 4 | 16384 | 4 | `FLEX_DEVICE=PF` | | `s390x` | 1 | 8192 | 4 | | | `s390x` | 2 | 8192 | 4 | | | `s390x` | 4 | 8192 | 4 | | | `s390x` | 4 | 8192 | 4 | `FLEX_DEVICE=PF` | | `s390x` | 4 | 16384 | 4 | `FLEX_DEVICE=PF` | | `s390x` | 4 | 16384 | 4 | `FLEX_DEVICE=VF` | ## Granite-3.3-8b-instruct-FP8 (Precision: 8 bit) **_Continuous Batching:_** | Platform | AIUs | Context Length | Batch Size | FLEX Device | |----------|------|----------------|------------|-------------| | `amd64` | 4 | 16384 | 4 | `PF` | | `s390x` | 4 | 8192 | 4 | `PF` | | `s390x` | 4 | 16384 | 4 | `PF` | | `s390x` | 4 | 16384 | 4 | `VF` | </details> <details><summary>second stab</summary> ## Configurations The following models have been verified to run on vLLM Spyre with the listed configurations. ### Decoder Models **_Static Batching:_** | Model | Platform | AIUs | Prompt Length | New Tokens | Batch Size | Edits | |----------------|------------|------|---------------|------------|------------|---------| | Granite-3.3-8b | `ppc64le` | 1 | 2048 | 1024 | 16 | remove? | | Granite-3.3-8b | `ppc64le` | 4 | 6144 | 2048 | 1 | remove? | | Granite-3.3-8b | `s390x` | 4 | 7168 | 1024 | 1 | remove? | **_Continuous Batching:_** | Model | Platform | AIUs | Context Length | Batch Size | Edits | |----------------------|-----------|------|----------------|------------|---------| | Granite-3.3-8b | `amd64` | 1 | 8192 | 4 | remove? | | Granite-3.3-8b | `amd64` | 4 | 16384 | 4 | remove? | | Granite-3.3-8b | `s390x` | 1 | 3072 | 16 | PELE | | Granite-3.3-8b | `s390x` | 1 | 8192 | 4 | remove? | | Granite-3.3-8b | `s390x` | 4 | 16384 | 4 | remove? | | Granite-3.3-8b | `s390x` | 4 | 32768 | 32 | PELE | | Granite-3.3-8b (FP8) | `amd64` | 4 | 16384 | 4 | remove? | | Granite-3.3-8b (FP8) | `s390x` | 4 | 8192 | 4 | remove? | | Granite-3.3-8b (FP8) | `s390x` | 4 | 16384 | 4 | remove? | | Granite-3.3-8b (FP8) | `s390x` | 4 | 16384 | 4 | remove? | | Granite-3.3-8b (FP8) | `ppc64le` | 1 | 3072 | 16 | remove? | | Granite-3.3-8b (FP8) | `s390x` | 1 | 8192 | 16 | PELE | | Granite-3.3-8b (FP8) | `s390x` | 1 | 32768 | 32 | PELE | | Granite-3.3-8b (FP8) | `s390x` | 1 | 4096 | 32 | PELE | | Granite-3.3-8b (FP8) | `s390x` | 1 | 32768 | 4 | PELE | | Granite-3.3-8b (FP8) | `s390x` | 1 | 16384 | 8 | PELE | ### Encoder Models | Model | Platform | AIUs | Context Length | Batch Size | Edits | |-------------------------------------|-----------|------|----------------|------------|-------| | Granite-Embedding-125m-English | `s390x` | 1 | 512 | 1 | PELE | | Granite-Embedding-125m-English | `s390x` | 1 | 512 | 64 | PELE | | granite-embedding-278m-multilingual | `s390x` | 1 | 512 | 1 | PELE | | granite-embedding-278m-multilingual | `s390x` | 1 | 512 | 64 | PELE | | BAAI/bge-reranker-v2-m3 | `s390x` | 1 | 2048 | 1 | PELE | | BAAI/bge-reranker-v2-m3 | `s390x` | 1 | 4096 | 1 | PELE | | BAAI/bge-reranker-v2-m3 | `s390x` | 1 | 8192 | 1 | PELE | | BAAI/bge-reranker-large | `s390x` | 1 | 512 | 1 | PELE | | BAAI/bge-reranker-large | `s390x` | 1 | 512 | 64 | PELE | ## Model Files | Model | Download | |-------------------------------------|------------------------------------------------------------------------------------| | Granite-3.3-8b | [Download](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) | | Granite-3.3-8b (FP8) | | | Granite-Embedding-125m-English | [Download](https://huggingface.co/ibm-granite/granite-embedding-125m-english) | | Granite-Embedding-278m-Multilingual | [Download](https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual) | | BAAI/bge-reranker-v2-m3 | [Download](https://huggingface.co/BAAI/bge-reranker-v2-m3) | | BAAI/bge-reranker-large | [Download](https://huggingface.co/BAAI/bge-reranker-large) | </details> <details open><summary>final</summary> <img width="514" height="835" alt="image" src="https://github.com/user-attachments/assets/7ec1fd70-04f6-4586-873d-b531e4ef522a" /> </details> ## Related Issues #445 --------- Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Validate that the batch size is <= a tested upper bound Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Hi @maxdebayser I added validation code and unit tests for:
Kindly take another look? Thank you! 🙏🏻 |
@ckadner , I was having trouble to explain my thoughts as review comments, so I put them in code form: ckadner#19 . |
@ckadner , my assumptions aren't correct. Please disregard some of my previous comments about upper bounds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a small suggestion, but otherwise it LGTM
One last item to do:
|
Description
Example:
WARNING 09-05 18:46:43 [runtime_config_validator.py:107] The requested configuration is not supported for model 'ibm-ai-platform/micro-g3.3-8b-instruct-1b': RuntimeConfiguration(platform=, cb=True, tp_size=1, max_model_len=128, max_num_seqs=2, num_blocks=0, warmup_shapes=None)
TODO:
num_blocks
(cpu, gpu ...override)?get_warmup_shapes_from_envs()
does not yield same asplatform.py
:cls._warmup_shapes
Review suggestions:
Related Issues
#435