Skip to content

[Fix] Add aarch64 (ARM64) support: auto-enable ONNX Runtime to fix SIGSEGV#17824

Closed
geoHeil wants to merge 1 commit intoPaddlePaddle:mainfrom
geoHeil:fix/aarch64-sigsegv-onnxruntime
Closed

[Fix] Add aarch64 (ARM64) support: auto-enable ONNX Runtime to fix SIGSEGV#17824
geoHeil wants to merge 1 commit intoPaddlePaddle:mainfrom
geoHeil:fix/aarch64-sigsegv-onnxruntime

Conversation

@geoHeil
Copy link
Copy Markdown

@geoHeil geoHeil commented Mar 16, 2026

Summary

  • Auto-detect Linux aarch64 and enable ONNX Runtime inference via HPI to work around PaddlePaddle SIGSEGV
  • Disable PIR executor flags that cause model loading crash on aarch64
  • Disable MKL-DNN (x86-only) on aarch64
  • Add Docker infrastructure to build/test on aarch64

Motivation

PaddlePaddle 3.x pre-built aarch64 wheels crash with SIGSEGV at two sites:

  1. Model loading — PIR executor corrupts std::filesystem::path objects (fixable with FLAGS_enable_pir_in_executor=0)
  2. Inference — null pointer dereference in native kernels (no env flag workaround)

This affects all users on Linux ARM64: Docker on Apple Silicon, Raspberry Pi, AWS Graviton, etc. Issues #17590 and #16685 have been open since October 2025 with no upstream PaddlePaddle fix.

The solution routes inference through ONNX Runtime via PaddleX's HPI (High-Performance Inference), which completely bypasses the broken native kernels.

Changes

paddleocr/_common_args.py

  • Detect Linux aarch64 at runtime
  • Set FLAGS_enable_pir_in_executor=0 and FLAGS_enable_pir_api=0 (fixes crash site 1)
  • Auto-enable HPI when ultra-infer + onnxruntime + paddle2onnx are installed (fixes crash site 2)
  • Disable MKL-DNN on aarch64 (x86-only, would silently fail)

deploy/docker/aarch64/

  • Multi-stage Dockerfile: builds ultra-infer from source for aarch64 with ORT backend, then installs PaddleOCR
  • patch_paddlex_hpi.py: patches PaddleX's backend selection to support aarch64 (until companion PR is merged)
  • test_aarch64.py: exercises both crash sites and verifies text detection works

docker-compose.yml

  • aarch64-test service for automated testing
  • aarch64 service for interactive debugging

Dependencies

Companion PR: PaddlePaddle/PaddleX#5048 — adds aarch64 to suggest_inference_backend_and_config() so HPI selects ONNX Runtime on ARM64. Once merged, the patch_paddlex_hpi.py workaround in the Dockerfile can be removed.

Test plan

  • Built ultra-infer from source for aarch64 in Docker (native ARM64 on Apple Silicon M2)
  • PaddleOCR(device="cpu") loads all 5 models without SIGSEGV
  • Inference completes successfully, text detection works ('HelloOCR' detected)
  • docker compose up --build aarch64-test passes all tests
  • Verify on AWS Graviton or Raspberry Pi (not yet tested)
docker build --platform linux/arm64 -t paddleocr-aarch64 -f deploy/docker/aarch64/Dockerfile .
docker run --platform linux/arm64 --rm paddleocr-aarch64

Fixes #17590
Related: #16685

🤖 Generated with Claude Code

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Mar 16, 2026

Thanks for your contribution!

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 16, 2026

CLA assistant check
All committers have signed the CLA.

@geoHeil geoHeil marked this pull request as ready for review March 16, 2026 14:11
@luotao1
Copy link
Copy Markdown
Collaborator

luotao1 commented Mar 18, 2026

Please fix the CodeStyle.

@geoHeil geoHeil force-pushed the fix/aarch64-sigsegv-onnxruntime branch 5 times, most recently from b2cab23 to 78f6eaa Compare March 18, 2026 07:21
…GSEGV

Pre-built PaddlePaddle aarch64 wheels crash with SIGSEGV during both
model loading (PIR executor) and inference (native kernels). This change:

1. Detects Linux aarch64 at runtime
2. Sets FLAGS_enable_pir_in_executor=0 (fixes model loading crash)
3. Auto-enables HPI with ONNX Runtime when ultra-infer is installed
   (bypasses broken native inference kernels)
4. Disables MKL-DNN on aarch64 (x86-only)
5. Adds Docker infrastructure to build ultra-infer from source for
   aarch64 and run end-to-end tests

Requires companion PR in PaddleX to add aarch64 to the HPI backend
selection function (suggest_inference_backend_and_config).

Fixes PaddlePaddle#17590
Related: PaddlePaddle#16685

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@geoHeil geoHeil force-pushed the fix/aarch64-sigsegv-onnxruntime branch from 78f6eaa to 325f4f2 Compare March 18, 2026 07:28
Copy link
Copy Markdown
Member

@Bobholamovic Bobholamovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed and professional PR! The overall approach is clear and well thought out.

From what I can see, this is a practical short-term fix. However, from a longer-term evolution and maintenance perspective, I have a few concerns that I'd like to discuss with you:

First, regarding the PaddlePaddle ABI issue on aarch64: if the root cause lies within PaddlePaddle itself, would it make more sense to address it at the framework level in the long run, rather than working around it in downstream libraries (such as PaddleOCR) by switching inference engines? For PaddleOCR, PaddlePaddle-based inference is still a core capability, and many parts of the existing documentation and codebase assume its presence. While a workaround is understandable in the short term, relying on this approach long-term could introduce inconsistencies and additional maintenance overhead.

Second, about introducing ultra-infer: from a user perspective, building ultra-infer from source can be quite challenging, especially without comprehensive documentation. While providing prebuilt binaries can lower the barrier to entry, it also introduces additional maintenance overhead for the PaddleOCR dev team, particularly when multiple build configurations (e.g., different Dockerfiles) need to be maintained across different repositories (PaddleOCR and PaddleX). As additional context, we are planning to support inference directly via ONNX Runtime Python bindings in future PaddleOCR releases, to further simplify installation and usage. Meanwhile, due to some historical reasons, ultra-infer may gradually move toward a less actively maintained or even deprecated state. From this perspective, the long-term sustainability of this approach may need further consideration.

Finally, from a design standpoint, the current handling of aarch64 (e.g., auto-enabling HPI) feels somewhat ad hoc and case-specific, which makes the overall design less clean and may negatively impact long-term maintainability. In addition, ultra-infer is primarily maintained in PaddleX, while its aarch64 adaptations and build processes (including Dockerfiles) are maintained within PaddleOCR. This split in responsibility feels somewhat fragmented and could further increase maintenance costs over time.

Overall, this PR is valuable as a short-term workaround, but I do have some concerns about its role in the long-term direction. I’d really appreciate hearing your thoughts, especially if there are constraints or context that I might be missing.

@geoHeil
Copy link
Copy Markdown
Author

geoHeil commented Mar 19, 2026

Thx for your response! I am not a core paddle developer. So your architectural topics were unclear for me.

It makese sense what you write. Do you have a timeline for when you would intend to support aarch? Or would it make possibly sense to support there to go there sooner?

@Bobholamovic
Copy link
Copy Markdown
Member

Bobholamovic commented Mar 19, 2026

Thx for your response! I am not a core paddle developer. So your architectural topics were unclear for me.

It makese sense what you write. Do you have a timeline for when you would intend to support aarch? Or would it make possibly sense to support there to go there sooner?

For "enabling inference on the aarch64 architecture in PaddleOCR via ONNX Runtime", I expect it will likely be around May or June.

@geoHeil
Copy link
Copy Markdown
Author

geoHeil commented Mar 19, 2026

If that holds true that would be a viable timeline on my end. should we close the PR then?

@Bobholamovic
Copy link
Copy Markdown
Member

Works for me; feel free to close it.

@geoHeil geoHeil closed this Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segmentation Fault when Loading PP-LCNet_x1_0_doc_ori Model on ARM CPU with PaddleOCR

4 participants