Skip to content

OVMS error when serving Qwen 2.5 VL with GPU #3635

@jpm-canonical

Description

@jpm-canonical

Describe the bug

An error is seen in the logs, and then OVMS exits.

[2025-09-05 10:48:02.793][60609][serving][error][servable_initializer.cpp:145] Error during llm node initialization for models_path: /model-repo/Qwen2.5-VL-7B-Instruct-int4-npu-ov/./ exception: Exception from src/inference/src/cpp/core.cpp:126:
Exception from src/inference/src/dev/plugin.cpp:58:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:163:
[GPU] ProgramBuilder build failed!
Program build failed(0_part_0):

To Reproduce
Steps to reproduce the behavior:

  1. Export Qwen 2.5 VL: optimum-cli export openvino --weight-format int4 --sym --group-size -1 --model Qwen/Qwen2.5-VL-7B-Instruct Qwen2.5-VL-7B-Instruct-int4-npu-ov
  2. OVMS launch command: ovms --rest_port 8080 --rest_bind_address 127.0.0.1 --source_model Qwen2.5-VL-7B-Instruct-int4-npu-ov --model_repository_path /model-repo --target_device GPU --log_level DEBUG --task text_generation
  3. See error

Expected behavior
OVMS prints a vague error, not describing what the cause is. If the problem is an unsupported GPU, then that should be reported.

Logs

omvs-qwen-logs.txt

Configuration

  1. OVMS version: 2025.3.0.6e2e910de
  2. OVMS config.json file: none
  3. CPU, accelerator's versions if applicable: n/a
  4. Model repository directory structure
$ tree model-repo
model-repo
├── Qwen2.5-VL-7B-Instruct-int4-npu-ov
│   ├── added_tokens.json
│   ├── chat_template.jinja
│   ├── config.json
│   ├── generation_config.json
│   ├── merges.txt
│   ├── openvino_config.json
│   ├── openvino_detokenizer.bin
│   ├── openvino_detokenizer.xml
│   ├── openvino_language_model.bin
│   ├── openvino_language_model.xml
│   ├── openvino_text_embeddings_model.bin
│   ├── openvino_text_embeddings_model.xml
│   ├── openvino_tokenizer.bin
│   ├── openvino_tokenizer.xml
│   ├── openvino_vision_embeddings_merger_model.bin
│   ├── openvino_vision_embeddings_merger_model.xml
│   ├── openvino_vision_embeddings_model.bin
│   ├── openvino_vision_embeddings_model.xml
│   ├── preprocessor_config.json
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.json
│   └── vocab.json
└── README.md

2 directories, 25 files
  1. Model or publicly available similar model that reproduces the issue: see step 1 under To Reproduce

Additional context

This same model and ovms command works on an Arc A580 GPU. Using an older iGPU produces the error reported here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions