-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
🔎 Search before asking | 提交之前请先搜索
- I have searched the MinerU Readme and found no similar bug report.
- I have searched the MinerU Issues and found no similar bug report.
- I have searched the MinerU Discussions and found no similar bug report.
🤖 Consult the online AI assistant for assistance | 在线 AI 助手咨询
- I have consulted the online AI assistant but was unable to obtain a solution to the issue.
Description of the bug | 错误描述
batch tuning via batch_ratio is off on 'mps' since vram is not reported and thus not calculated at all in pipeline_analyze.py. the ratio is calculated only for 'npu' and 'cuda',.
adding the following, which approximates the VRAM to safely be 0.7 of the available memory results in reasonable performance gains for me.
if str(device).startswith('mps'):
# MPS-specific VRAM detection
if vram is None:
try:
import torch
if torch.backends.mps.is_available():
# Get recommended max memory from system
# MPS doesn't expose direct VRAM query, use system memory as proxy
import psutil
system_memory_gb = psutil.virtual_memory().total / (1024**3)
# On Apple Silicon, GPU shares system memory
# Use conservative estimate: 70% of system memory for GPU tasks
vram = system_memory_gb * 0.7
logger.info(f'MPS device detected, estimated shared memory: {vram:.1f} GB')
except Exception as e:
logger.warning(f'Could not determine MPS memory: {e}')
vram = 16 # Conservative default for Apple Silicon
# Determine batch ratio based on available memory
if vram >= 32:
batch_ratio = 16
elif vram >= 24:
batch_ratio = 12
elif vram >= 16:
batch_ratio = 8
else:
batch_ratio = 4
logger.info(f'MPS device detected, estimated VRAM: {vram:.1f} GB, using batch_ratio: {batch_ratio}')apart from that these rather aggressive values in the batch_analyze.py also produce improved performance
YOLO_LAYOUT_BASE_BATCH_SIZE = 8
MFD_BASE_BATCH_SIZE = 8
MFR_BASE_BATCH_SIZE = 16
OCR_DET_BASE_BATCH_SIZE = 16
TABLE_ORI_CLS_BATCH_SIZE = 32
TABLE_Wired_Wireless_CLS_BATCH_SIZE = 32(note that those impacted by batch_ratio are preserved in line with HEAD)
currently also experimenting with the binning for the OCR-det batch step where a more agressive binning of the images (in larger patches) result in sometimes tenfold increase in performance. i'll open separate issue for it when I also get some time to test on my CUDA/24GB device.
How to reproduce the bug | 如何复现
Operating System Mode | 操作系统类型
MacOS
Operating System Version| 操作系统版本
OS 15.7 on M3 Max 64GB
Python version | Python 版本
3.11
Software version | 软件版本 (mineru --version)
=2.5
Backend name | 解析后端
pipeline
Device mode | 设备模式
mps