Skip to content

Add pdf_oxide to benchmark suite#19

Open
yfedoseev wants to merge 1 commit intopy-pdf:mainfrom
yfedoseev:add-pdf-oxide
Open

Add pdf_oxide to benchmark suite#19
yfedoseev wants to merge 1 commit intopy-pdf:mainfrom
yfedoseev:add-pdf-oxide

Conversation

@yfedoseev
Copy link

Adds pdf_oxide to the benchmark suite with text extraction and image extraction support.

Changes

  • pdf_benchmark/library_code.py: Added pdf_oxide_get_text() and pdf_oxide_image_extraction() functions (uses tempfile approach, same as pdftotext, since pdf_oxide accepts file paths)
  • benchmark.py: Registered pdf_oxide as a library with imports
  • requirements/main.in: Added pdf-oxide dependency

About pdf_oxide

  • Rust core with Python bindings via PyO3
  • MIT / Apache-2.0 licensed
  • Text extraction, image extraction, markdown conversion
  • v0.3.6 released: PyPI | GitHub

Adds text extraction and image extraction benchmarks for pdf_oxide,
a Rust-powered PDF library with Python bindings.

- Text extraction via tempfile (pdf_oxide accepts file paths)
- Image extraction with format-aware naming
- MIT/Apache-2.0 licensed

https://github.com/yfedoseev/pdf_oxide
https://pypi.org/project/pdf-oxide/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant