Skip to content

batch_ratio calculation on 'mps' devices doing pipeline mode #3882

@stelf

Description

@stelf

🔎 Search before asking | 提交之前请先搜索

  • I have searched the MinerU Readme and found no similar bug report.
  • I have searched the MinerU Issues and found no similar bug report.
  • I have searched the MinerU Discussions and found no similar bug report.

🤖 Consult the online AI assistant for assistance | 在线 AI 助手咨询

Description of the bug | 错误描述

batch tuning via batch_ratio is off on 'mps' since vram is not reported and thus not calculated at all in pipeline_analyze.py. the ratio is calculated only for 'npu' and 'cuda',.

adding the following, which approximates the VRAM to safely be 0.7 of the available memory results in reasonable performance gains for me.

 if str(device).startswith('mps'):
        # MPS-specific VRAM detection
        if vram is None:
            try:
                import torch
                if torch.backends.mps.is_available():
                    # Get recommended max memory from system
                    # MPS doesn't expose direct VRAM query, use system memory as proxy
                    import psutil
                    system_memory_gb = psutil.virtual_memory().total / (1024**3)
                    # On Apple Silicon, GPU shares system memory
                    # Use conservative estimate: 70% of system memory for GPU tasks
                    vram = system_memory_gb * 0.7
                    logger.info(f'MPS device detected, estimated shared memory: {vram:.1f} GB')
            except Exception as e:
                logger.warning(f'Could not determine MPS memory: {e}')
                vram = 16  # Conservative default for Apple Silicon

        # Determine batch ratio based on available memory
        if vram >= 32:
            batch_ratio = 16
        elif vram >= 24:
            batch_ratio = 12
        elif vram >= 16:
            batch_ratio = 8
        else:
            batch_ratio = 4

        logger.info(f'MPS device detected, estimated VRAM: {vram:.1f} GB, using batch_ratio: {batch_ratio}')

apart from that these rather aggressive values in the batch_analyze.py also produce improved performance

YOLO_LAYOUT_BASE_BATCH_SIZE = 8
MFD_BASE_BATCH_SIZE = 8
MFR_BASE_BATCH_SIZE = 16
OCR_DET_BASE_BATCH_SIZE = 16
TABLE_ORI_CLS_BATCH_SIZE = 32
TABLE_Wired_Wireless_CLS_BATCH_SIZE = 32

(note that those impacted by batch_ratio are preserved in line with HEAD)

currently also experimenting with the binning for the OCR-det batch step where a more agressive binning of the images (in larger patches) result in sometimes tenfold increase in performance. i'll open separate issue for it when I also get some time to test on my CUDA/24GB device.

How to reproduce the bug | 如何复现


Operating System Mode | 操作系统类型

MacOS

Operating System Version| 操作系统版本

OS 15.7 on M3 Max 64GB

Python version | Python 版本

3.11

Software version | 软件版本 (mineru --version)

=2.5

Backend name | 解析后端

pipeline

Device mode | 设备模式

mps

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions