Skip to content

Commit 95dc316

Browse files
BobholamovicTingquanGaozhang-progSunting78AmirHosseinOmidi0
authored
Merge main (#17074)
* polish (#16665) * polish (#16667) * polish (#16670) * polish (#16676) * use FlashAttention 2.8.2 (#16689) * polish (#16690) * update docs (#16688) * update docs * add methods * adding frigate to awesome_projects.md (#16659) Frigate is a real-time NVR system that uses PaddleOCR for License Plate Recognition (LPR). Co-authored-by: AmirHossein_Omidi <[email protected]> * update PaddleOCR-VL paper url (#16696) * update PaddleOCR-VL paper url * polish README * update doc (#16700) * [doc] add hareware support (#16725) * Add hardware support * Add hardware support * fix * update * update * 109 langs dos (#16718) * fix invalid link in doc (#16719) * fix conflict * fix doc2 * update fqa (#16716) * update fqa * Update PaddleOCR-VL.en.md * Update PaddleOCR-VL.en.md * Update PaddleOCR-VL.en.md * Update PaddleOCR-VL.md * support cinn flag (#16745) * docs: fix_doc1 (#16752) * Parse all local OCR result batches instead of only the first (#16756) Signed-off-by: Adler Fleurant <[email protected]> * [ILUVATAR_GPU] Support for iluvatar_gpu (#16518) * docs: fix valid link in doc1022 (#16812) * fix_doc * fix_doc * update readme (#16861) * update readme * fix code-style for readme * [Docs] Optimize docs for deployment of PaddleOCR-VL (#16808) * Optimize docs for deployment of PaddleOCR-VL * Update docs * Fix not-using-doc-prepeocessor bug * Update dockerfiles and docs * Add SFT * Fix code style * Add PaddleOCR-VL-0.9B model into offline pipeline image * Support Windows * Add lower bound for paddleocr version * Revert windows and paddle 3.2.1 * Support setting paddleocr version * Fix typo * Update docker image sizes * Fix bug * Fix doc * add fastdeploy-server backend (#16879) * Polish README (#16904) * polish README * polish * polish badge of readme (#16909) * update PaddleOCR-VL.md (#16926) * update PaddleOCR-VL.md * update * update * update * update * add en docs * update mkdocs (#16946) * docs: fix invalid link in doc (#16947) * Update MCP docs (#16941) * Fix injection vulnerability in pdf2word (#16910) * update doc (#16776) * Fix docs (#16898) * [Feat] Support building SM120 images (#16919) * Support building SM120 images * Set VLM batch size to 4096 * Support Switching to fastdeploy backend * Update dockerfiles * Fix config file * Support DCU and XPU * Remove unused file * Fix bugs * Support install genai fastdeploy server deps * Bump FD version to 2.3.0 * Fix pipeline configs * Fix dockerfile for DCU * Add DCU and XPU compose files * Add XPU compose files --------- Signed-off-by: Adler Fleurant <[email protected]> Co-authored-by: Tingquan Gao <[email protected]> Co-authored-by: zhang-prog <[email protected]> Co-authored-by: Sunflower7788 <[email protected]> Co-authored-by: AmirHossein_Omidi <[email protected]> Co-authored-by: AmirHossein_Omidi <[email protected]> Co-authored-by: changdazhou <[email protected]> Co-authored-by: liuhongen1234567 <[email protected]> Co-authored-by: Zx <[email protected]> Co-authored-by: Adler Fleurant <[email protected]> Co-authored-by: tianyuzhou668 <[email protected]> Co-authored-by: cuicheng01 <[email protected]>
1 parent 88dd667 commit 95dc316

File tree

97 files changed

+1886
-763
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

97 files changed

+1886
-763
lines changed

README.md

Lines changed: 34 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,24 @@
55

66
English | [简体中文](./readme/README_cn.md) | [繁體中文](./readme/README_tcn.md) | [日本語](./readme/README_ja.md) | [한국어](./readme/README_ko.md) | [Français](./readme/README_fr.md) | [Русский](./readme/README_ru.md) | [Español](./readme/README_es.md) | [العربية](./readme/README_ar.md)
77

8+
<!-- icon -->
89
[![stars](https://img.shields.io/github/stars/PaddlePaddle/PaddleOCR?color=ccf)](https://github.com/PaddlePaddle/PaddleOCR)
9-
[![arXiv](https://img.shields.io/badge/arXiv-2507.05595-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2507.05595)
10-
[![PyPI Downloads](https://static.pepy.tech/badge/paddleocr/month)](https://pepy.tech/project/paddleocr)
11-
[![PyPI Downloads](https://static.pepy.tech/badge/paddleocr)](https://pepy.tech/project/paddleocr)
12-
[![Used by](https://img.shields.io/badge/Used%20by-5.9k%2B%20repositories-blue)](https://github.com/PaddlePaddle/PaddleOCR/network/dependents)
13-
10+
[![forks](https://img.shields.io/github/forks/PaddlePaddle/PaddleOCR.svg)](https://github.com/PaddlePaddle/PaddleOCR)
11+
[![arXiv](https://img.shields.io/badge/PaddleOCR_3.0-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2507.05595)
12+
[![arXiv](https://img.shields.io/badge/PaddleOCR--VL-Technical%20Report-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2510.14528)
13+
14+
[![PyPI Downloads](https://static.pepy.tech/badge/paddleocr/month)](https://pepy.tech/projectsproject/paddleocr)
15+
[![PyPI Downloads](https://static.pepy.tech/badge/paddleocr)](https://pepy.tech/projects/paddleocr)
16+
[![Used by](https://img.shields.io/badge/Used%20by-6k%2B%20repositories-blue)](https://github.com/PaddlePaddle/PaddleOCR/network/dependents)
17+
[![PyPI version](https://img.shields.io/pypi/v/paddleocr)](https://pypi.org/project/paddleocr/)
1418
![python](https://img.shields.io/badge/python-3.8~3.12-aff.svg)
19+
1520
![os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg)
1621
![hardware](https://img.shields.io/badge/hardware-cpu%2C%20gpu%2C%20xpu%2C%20npu-yellow.svg)
17-
[![License](https://img.shields.io/badge/license-Apache_2.0-green)](./LICENSE)
22+
[![License](https://img.shields.io/badge/license-Apache_2.0-green)](../LICENSE)
1823
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/PaddlePaddle/PaddleOCR)
24+
[![AI Studio](https://img.shields.io/badge/PaddleOCR-_Offiical_Website-1927BA?logo=&labelColor=white)](https://www.paddleocr.com)
25+
1926

2027

2128
**PaddleOCR is an industry-leading, production-ready OCR and document AI engine, offering end-to-end solutions from text extraction to intelligent document understanding**
@@ -32,12 +39,14 @@ English | [简体中文](./readme/README_cn.md) | [繁體中文](./readme/README
3239
> [!TIP]
3340
> PaddleOCR now provides an MCP server that supports integration with Agent applications like Claude Desktop. For details, please refer to [PaddleOCR MCP Server](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/deployment/mcp_server.html).
3441
>
35-
> The PaddleOCR 3.0 Technical Report is now available. See details at: [PaddleOCR 3.0 Technical Report](https://arxiv.org/abs/2507.05595)
42+
> The PaddleOCR 3.0 Technical Report is now available. See details at: [PaddleOCR 3.0 Technical Report](https://arxiv.org/abs/2507.05595).
43+
>
44+
> The PaddleOCR-VL Technical Report is now available. See details at [PaddleOCR-VL Technical Report](https://arxiv.org/abs/2510.14528).
3645
>
37-
> The PaddleOCR-VL Technical Report is now available. See details at [PaddleOCR-VL Technical Report](https://arxiv.org/abs/2510.14528)
46+
> The Beta version of the PaddleOCR official website is now live, offering a more convenient online experience and large-scale PDF file parsing, as well as free API and MCP services. For more details, please visit the [PaddleOCR official website](https://www.paddleocr.com).
3847
3948

40-
**PaddleOCR** converts documents and images into **structured, AI-friendly data** (like JSON and Markdown) with **industry-leading accuracy**—powering AI applications for everyone from indie developers and startups to large enterprises worldwide. With over **50,000 stars** and deep integration into leading projects like **MinerU, RAGFlow, and OmniParser**, PaddleOCR has become the **premier solution** for developers building intelligent document applications in the **AI era**.
49+
**PaddleOCR** converts documents and images into **structured, AI-friendly data** (like JSON and Markdown) with **industry-leading accuracy**—powering AI applications for everyone from indie developers and startups to large enterprises worldwide. With over **60,000 stars** and deep integration into leading projects like **MinerU, RAGFlow, pathway and cherry-studio**, PaddleOCR has become the **premier solution** for developers building intelligent document applications in the **AI era**.
4150

4251
### PaddleOCR 3.0 Core Features
4352

@@ -426,15 +435,29 @@ for res in output:
426435

427436
## 🔄 Quick Overview of Execution Results
428437

438+
### PP-OCRv5
439+
440+
<div align="center">
441+
<p>
442+
<img width="100%" src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/README/PP-OCRv5_demo.gif" alt="PP-OCRv5 Demo">
443+
</p>
444+
</div>
445+
446+
447+
448+
### PP-StructureV3
449+
429450
<div align="center">
430451
<p>
431-
<img width="100%" src="./docs/images/demo.gif" alt="PP-OCRv5 Demo">
452+
<img width="100%" src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/README/PP-StructureV3_demo.gif" alt="PP-StructureV3 Demo">
432453
</p>
433454
</div>
434455

456+
### PaddleOCR-VL
457+
435458
<div align="center">
436459
<p>
437-
<img width="100%" src="./docs/images/blue_v3.gif" alt="PP-StructureV3 Demo">
460+
<img width="100%" src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/README/PaddleOCR-VL_demo.gif" alt="PP-StructureV3 Demo">
438461
</p>
439462
</div>
440463

deploy/paddleocr_vl_docker/.env

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
API_IMAGE_TAG_SUFFIX=latest-offline
2+
VLM_BACKEND=vllm
3+
VLM_IMAGE_TAG_SUFFIX=latest-offline
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
#!/usr/bin/env bash
2+
3+
build_for_offline='false'
4+
paddleocr_version='>=3.3.2,<3.4'
5+
tag_suffix='latest'
6+
base_image='python:3.10'
7+
8+
while [[ $# -gt 0 ]]; do
9+
case $1 in
10+
--device-type)
11+
device_type="$2"
12+
shift
13+
shift
14+
if [ "${device_type}" != 'gpu' ] && [ "${device_type}" != 'gpu-sm120' ] && [ "${device_type}" != 'dcu' ] && [ "${device_type}" != 'xpu' ]; then
15+
echo "Unknown device type: ${device_type}" >&2
16+
exit 2
17+
fi
18+
;;
19+
--offline)
20+
build_for_offline='true'
21+
shift
22+
;;
23+
--ppocr-version)
24+
paddleocr_version="==$2"
25+
shift
26+
shift
27+
;;
28+
--tag-suffix)
29+
tag_suffix="$2"
30+
shift
31+
shift
32+
;;
33+
*)
34+
echo "Unknown option: $1" >&2
35+
exit 2
36+
;;
37+
esac
38+
done
39+
40+
if [ "${device_type}" != 'gpu' ]; then
41+
tag_suffix="${tag_suffix}-${device_type}"
42+
fi
43+
44+
if [ "${build_for_offline}" = 'true' ]; then
45+
tag_suffix="${tag_suffix}-offline"
46+
fi
47+
48+
if [ "${device_type}" = 'dcu' ]; then
49+
base_image='ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle-dcu:dtk24.04.1-kylinv10-gcc82'
50+
elif [ "${device_type}" = 'xpu' ]; then
51+
base_image='ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-xpu:ubuntu20-x86_64-gcc84-py310'
52+
fi
53+
54+
docker build \
55+
-f pipeline.Dockerfile \
56+
-t "ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:${tag_suffix}" \
57+
--build-arg BASE_IMAGE="${base_image}" \
58+
--build-arg DEVICE_TYPE="${device_type}" \
59+
--build-arg BUILD_FOR_OFFLINE="${build_for_offline}" \
60+
--build-arg PADDLEOCR_VERSION="${paddleocr_version}" \
61+
--build-arg http_proxy="${http_proxy}" \
62+
--build-arg https_proxy="${https_proxy}" \
63+
--build-arg no_proxy="${no_proxy}" \
64+
.
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
#!/usr/bin/env bash
2+
3+
device_type=''
4+
backend=''
5+
build_for_offline='false'
6+
paddleocr_version='>=3.3.2,<3.4'
7+
tag_suffix='latest'
8+
base_image=''
9+
10+
while [[ $# -gt 0 ]]; do
11+
case $1 in
12+
--device-type)
13+
device_type="$2"
14+
shift
15+
shift
16+
if [ "${device_type}" != 'gpu' ] && [ "${device_type}" != 'gpu-sm120' ] && [ "${device_type}" != 'dcu' ] && [ "${device_type}" != 'xpu' ]; then
17+
echo "Unknown device type: ${device_type}" >&2
18+
exit 2
19+
fi
20+
;;
21+
--backend)
22+
backend="$2"
23+
shift
24+
shift
25+
if [ "${backend}" != 'vllm' ] && [ "${backend}" != 'fastdeploy' ]; then
26+
echo "Unknown backend: ${backend}" >&2
27+
exit 2
28+
fi
29+
;;
30+
--offline)
31+
build_for_offline='true'
32+
shift
33+
;;
34+
--ppocr-version)
35+
paddleocr_version="==$2"
36+
shift
37+
shift
38+
;;
39+
--tag-suffix)
40+
tag_suffix="$2"
41+
shift
42+
shift
43+
;;
44+
*)
45+
echo "Unknown option: $1" >&2
46+
exit 2
47+
;;
48+
esac
49+
done
50+
51+
if [ "${device_type}" != 'gpu' ]; then
52+
tag_suffix="${tag_suffix}-${device_type}"
53+
fi
54+
55+
if [ "${build_for_offline}" = 'true' ]; then
56+
tag_suffix="${tag_suffix}-offline"
57+
fi
58+
59+
if [ "${backend}" = 'vllm' ]; then
60+
if [ "${device_type}" = 'gpu' ]; then
61+
base_image='ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddlex-genai-vllm-server:latest'
62+
elif [ "${device_type}" = 'gpu-sm120' ]; then
63+
base_image='ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddlex-genai-vllm-server:latest-sm120'
64+
elif [ "${device_type}" = 'dcu' ]; then
65+
base_image='image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-py3.10'
66+
elif [ "${device_type}" = 'xpu' ]; then
67+
base_image=''
68+
else
69+
base_image=''
70+
fi
71+
elif [ "${backend}" = 'fastdeploy' ]; then
72+
if [ "${device_type}" = 'gpu' ]; then
73+
base_image='ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.3.0'
74+
elif [ "${device_type}" = 'gpu-sm120' ]; then
75+
base_image=''
76+
elif [ "${device_type}" = 'dcu' ]; then
77+
base_image=''
78+
elif [ "${device_type}" = 'xpu' ]; then
79+
base_image='ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.3.0'
80+
else
81+
base_image=''
82+
fi
83+
else
84+
echo "Unknown backend: ${backend}" >&2
85+
exit 1
86+
fi
87+
88+
if [ -z "${base_image}" ]; then
89+
echo "Backend '${backend}' does not support device type '${device_type}'" >&2
90+
exit 2
91+
fi
92+
93+
docker build \
94+
-f vlm.Dockerfile \
95+
-t "ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-${backend}-server:${tag_suffix}" \
96+
--build-arg BASE_IMAGE="${base_image}" \
97+
--build-arg BUILD_FOR_OFFLINE="${build_for_offline}" \
98+
--build-arg PADDLEOCR_VERSION="${paddleocr_version}" \
99+
--build-arg BACKEND="${backend}" \
100+
--build-arg http_proxy="${http_proxy}" \
101+
--build-arg https_proxy="${https_proxy}" \
102+
--build-arg no_proxy="${no_proxy}" \
103+
.

deploy/paddleocr_vl_docker/compose.yaml

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
services:
22
paddleocr-vl-api:
3-
image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest-offline
3+
image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:${API_IMAGE_TAG_SUFFIX}
44
container_name: paddleocr-vl-api
55
ports:
66
- 8080:8080
77
depends_on:
8-
paddleocr-genai-vllm-server:
8+
paddleocr-vlm-server:
99
condition: service_healthy
1010
deploy:
1111
resources:
@@ -14,20 +14,27 @@ services:
1414
- driver: nvidia
1515
device_ids: ["0"]
1616
capabilities: [gpu]
17+
# TODO: Allow using a regular user
18+
user: root
1719
restart: unless-stopped
20+
environment:
21+
- VLM_BACKEND=${VLM_BACKEND:-vllm}
22+
command: /bin/bash -c "paddlex --serve --pipeline /home/paddleocr/pipeline_config_${VLM_BACKEND}.yaml"
1823
healthcheck:
1924
test: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
2025

21-
paddleocr-genai-vllm-server:
22-
image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-vllm-server:latest-offline
23-
container_name: paddleocr-genai-vllm-server
26+
paddleocr-vlm-server:
27+
image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-${VLM_BACKEND}-server:${VLM_IMAGE_TAG_SUFFIX}
28+
container_name: paddleocr-vlm-server
2429
deploy:
2530
resources:
2631
reservations:
2732
devices:
2833
- driver: nvidia
2934
device_ids: ["0"]
3035
capabilities: [gpu]
36+
# TODO: Allow using a regular user
37+
user: root
3138
restart: unless-stopped
3239
healthcheck:
3340
test: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
services:
2+
paddleocr-vl-api:
3+
image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:${API_IMAGE_TAG_SUFFIX}
4+
container_name: paddleocr-vl-api
5+
ports:
6+
- 8080:8080
7+
depends_on:
8+
paddleocr-vlm-server:
9+
condition: service_healthy
10+
user: root
11+
privileged: true
12+
devices:
13+
- /dev/kfd
14+
- /dev/dri
15+
- /dev/mkfd
16+
group_add:
17+
- video
18+
cap_add:
19+
- SYS_PTRACE
20+
security_opt:
21+
- seccomp=unconfined
22+
restart: unless-stopped
23+
environment:
24+
- VLM_BACKEND=${VLM_BACKEND:-vllm}
25+
command: /bin/bash -c "paddlex --serve --pipeline /home/paddleocr/pipeline_config_${VLM_BACKEND}.yaml"
26+
healthcheck:
27+
test: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
28+
volumes:
29+
- /opt/hyhal/:/opt/hyhal/:ro
30+
shm_size: 64G
31+
32+
paddleocr-vlm-server:
33+
image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-${VLM_BACKEND}-server:${VLM_IMAGE_TAG_SUFFIX}
34+
container_name: paddleocr-vlm-server
35+
user: root
36+
privileged: true
37+
devices:
38+
- /dev/kfd
39+
- /dev/dri
40+
- /dev/mkfd
41+
group_add:
42+
- video
43+
cap_add:
44+
- SYS_PTRACE
45+
security_opt:
46+
- seccomp=unconfined
47+
restart: unless-stopped
48+
healthcheck:
49+
test: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
50+
start_period: 300s
51+
volumes:
52+
- /opt/hyhal/:/opt/hyhal/:ro
53+
shm_size: 64G
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
services:
2+
paddleocr-vl-api:
3+
image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:${API_IMAGE_TAG_SUFFIX}
4+
container_name: paddleocr-vl-api
5+
ports:
6+
- 8080:8080
7+
depends_on:
8+
paddleocr-vlm-server:
9+
condition: service_healthy
10+
user: root
11+
restart: unless-stopped
12+
environment:
13+
- VLM_BACKEND=${VLM_BACKEND:-vllm}
14+
command: /bin/bash -c "paddlex --serve --pipeline /home/paddleocr/pipeline_config_${VLM_BACKEND}.yaml"
15+
healthcheck:
16+
test: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
17+
volumes:
18+
- /usr/local/Ascend/driver:/usr/local/Ascend/driver
19+
- /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
20+
- /usr/local/dcmi:/usr/local/dcmi
21+
privileged: true
22+
shm_size: 64G
23+
24+
paddleocr-vlm-server:
25+
image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-genai-${VLM_BACKEND}-server:${VLM_IMAGE_TAG_SUFFIX}
26+
container_name: paddleocr-vlm-server
27+
user: root
28+
restart: unless-stopped
29+
healthcheck:
30+
test: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
31+
start_period: 300s
32+
volumes:
33+
- /usr/local/Ascend/driver:/usr/local/Ascend/driver
34+
- /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
35+
- /usr/local/dcmi:/usr/local/dcmi
36+
privileged: true
37+
shm_size: 64G

0 commit comments

Comments
 (0)