Skip to content

V100 基于vllm-openai:v0.10.2无法使用vlm-vllm-async-engine #3909

@zorogong

Description

@zorogong

🔎 Search before asking | 提交之前请先搜索

  • I have searched the MinerU Readme and found no similar bug report.
  • I have searched the MinerU Issues and found no similar bug report.
  • I have searched the MinerU Discussions and found no similar bug report.

🤖 Consult the online AI assistant for assistance | 在线 AI 助手咨询

Description of the bug | 错误描述

采用仓库中的dockerfile构建镜像,dockerfile如下,由于gpu是v100,使用vllm-openai:v0.10.2作为基础镜像

# Use DaoCloud mirrored vllm image for China region for gpu with Ampere architecture and above (Compute Capability>=8.0)
# Compute Capability version query (https://developer.nvidia.com/cuda-gpus)
# FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.1.1
                                                                                                              
# Use the official vllm image
# FROM vllm/vllm-openai:v0.10.1.1
                                                                                                              
# Use DaoCloud mirrored vllm image for China region for gpu with Turing architecture and below (Compute Capability<8.0)
FROM docker.m.daocloud.io/vllm/vllm-openai:v0.10.2
                                                                                                              
# Use the official vllm image
# FROM vllm/vllm-openai:v0.10.2

# Install libgl for opencv support & Noto fonts for Chinese characters
RUN apt-get update && \
    apt-get install -y \
        fonts-noto-core \
        fonts-noto-cjk \
        fontconfig \
        libgl1 && \
    fc-cache -fv && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
                                                                                                              
# Install mineru latest
RUN python3 -m pip install -U 'mineru[all]' -i https://mirrors.aliyun.com/pypi/simple --break-system-packages && \
    python3 -m pip cache purge
                                                                                                              
# Download models and update the configuration file
RUN /bin/bash -c "mineru-models-download -s modelscope -m all"
                                                                                                              
# Set the entry point to activate the virtual environment and run the command line tool
ENTRYPOINT ["/bin/bash", "-c", "export MINERU_MODEL_SOURCE=local && exec \"$@\"", "--"]

部署完成后,通过接口调用使用vlm-vllm-async-engine模型,api接口报如下错误:
mineru-api | (EngineCore_DP0 pid=32) raise RuntimeError("FlashInfer requires GPUs with sm75 or higher")
mineru-api | (EngineCore_DP0 pid=32) RuntimeError: FlashInfer requires GPUs with sm75 or higher

如果通过修改VLLM_USE_V1=0会导致其他错误

How to reproduce the bug | 如何复现

通过如下docker-compose部署
root@ecs-2025f066-ai:/comfyui_dir/mineru2.5# cat compose.yaml
services:
mineru-api:
image: mineru:2.5
container_name: mineru-api
restart: always
profiles: ["api"]
ports:
- 5007:8000
environment:
MINERU_MODEL_SOURCE: local
entrypoint: mineru-api
command:
--host 0.0.0.0
--port 8000
--gpu-memory-utilization 0.7 # If running on a single GPU and encountering VRAM shortage, reduce the KV cache size by this parameter, if VRAM issues persist, try lowering it further to 0.4 or below.
ulimits:
memlock: -1
stack: 67108864
ipc: host
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: [ "0" ]
capabilities: [ gpu ]

Operating System Mode | 操作系统类型

Linux

Operating System Version| 操作系统版本

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"
PRETTY_NAME="Ubuntu 22.04.2 LTS"

Python version | Python 版本

3.12

Software version | 软件版本 (mineru --version)

>=2.5

Backend name | 解析后端

vlm

Device mode | 设备模式

cuda

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions