V100 基于vllm-openai:v0.10.2无法使用vlm-vllm-async-engine

### 🔎 Search before asking | 提交之前请先搜索

- [x] I have searched the MinerU [Readme](https://github.com/opendatalab/MinerU) and found no similar bug report.
- [x] I have searched the MinerU [Issues](https://github.com/opendatalab/MinerU/issues) and found no similar bug report.
- [x] I have searched the MinerU [Discussions](https://github.com/opendatalab/MinerU/discussions) and found no similar bug report.

### 🤖 Consult the online AI assistant for assistance | 在线 AI 助手咨询

- [x] I have consulted the [online AI assistant](https://deepwiki.com/opendatalab/MinerU) but was unable to obtain a solution to the issue.

### Description of the bug | 错误描述

采用仓库中的dockerfile构建镜像，dockerfile如下，由于gpu是v100，使用vllm-openai:v0.10.2作为基础镜像

```
# Use DaoCloud mirrored vllm image for China region for gpu with Ampere architecture and above (Compute Capability>=8.0)
# Compute Capability version query (https://developer.nvidia.com/cuda-gpus)
# FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.1.1
                                                                                                              
# Use the official vllm image
# FROM vllm/vllm-openai:v0.10.1.1
                                                                                                              
# Use DaoCloud mirrored vllm image for China region for gpu with Turing architecture and below (Compute Capability<8.0)
FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.2
                                                                                                              
# Use the official vllm image
# FROM vllm/vllm-openai:v0.10.2

# Install libgl for opencv support & Noto fonts for Chinese characters
RUN apt-get update && \
    apt-get install -y \
        fonts-noto-core \
        fonts-noto-cjk \
        fontconfig \
        libgl1 && \
    fc-cache -fv && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
                                                                                                              
# Install mineru latest
RUN python3 -m pip install -U 'mineru[all]' -i https://mirrors.aliyun.com/pypi/simple --break-system-packages && \
    python3 -m pip cache purge
                                                                                                              
# Download models and update the configuration file
RUN /bin/bash -c "mineru-models-download -s modelscope -m all"
                                                                                                              
# Set the entry point to activate the virtual environment and run the command line tool
ENTRYPOINT ["/bin/bash", "-c", "export MINERU_MODEL_SOURCE=local && exec \"$@\"", "--"]
```

部署完成后，通过接口调用使用vlm-vllm-async-engine模型，api接口报如下错误：
mineru-api  | (EngineCore_DP0 pid=32)     raise RuntimeError("FlashInfer requires GPUs with sm75 or higher")
mineru-api  | (EngineCore_DP0 pid=32) RuntimeError: FlashInfer requires GPUs with sm75 or higher

如果通过修改VLLM_USE_V1=0会导致其他错误

### How to reproduce the bug | 如何复现

通过如下docker-compose部署
root@ecs-2025f066-ai:/comfyui_dir/mineru2.5# cat compose.yaml 
services:
  mineru-api:
    image: mineru:2.5
    container_name: mineru-api
    restart: always
    profiles: ["api"]
    ports:
      - 5007:8000
    environment:
      MINERU_MODEL_SOURCE: local
    entrypoint: mineru-api
    command:
      --host 0.0.0.0
      --port 8000
      --gpu-memory-utilization 0.7  # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to `0.4` or below.
    ulimits:
      memlock: -1
      stack: 67108864
    ipc: host
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: [ "0" ]
              capabilities: [ gpu ]

### Operating System Mode | 操作系统类型

Linux

### Operating System Version| 操作系统版本

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"
PRETTY_NAME="Ubuntu 22.04.2 LTS"

### Python version | Python 版本

3.12

### Software version | 软件版本 (mineru --version)

`>=2.5`

### Backend name | 解析后端

vlm

### Device mode | 设备模式

cuda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

V100 基于vllm-openai:v0.10.2无法使用vlm-vllm-async-engine #3909

🔎 Search before asking | 提交之前请先搜索

🤖 Consult the online AI assistant for assistance | 在线 AI 助手咨询

Description of the bug | 错误描述

How to reproduce the bug | 如何复现

Operating System Mode | 操作系统类型

Operating System Version| 操作系统版本

Python version | Python 版本

Software version | 软件版本 (mineru --version)

Backend name | 解析后端

Device mode | 设备模式

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

V100 基于vllm-openai:v0.10.2无法使用vlm-vllm-async-engine #3909

Description

🔎 Search before asking | 提交之前请先搜索

🤖 Consult the online AI assistant for assistance | 在线 AI 助手咨询

Description of the bug | 错误描述

How to reproduce the bug | 如何复现

Operating System Mode | 操作系统类型

Operating System Version| 操作系统版本

Python version | Python 版本

Software version | 软件版本 (mineru --version)

Backend name | 解析后端

Device mode | 设备模式

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions