Skip to content

Conversation

dependabot[bot]
Copy link

@dependabot dependabot bot commented on behalf of github Apr 28, 2025

Updates the requirements on pymupdf4llm to permit the latest version.

Release notes

Sourced from pymupdf4llm's releases.

Version 0.0.22

See this file.

Changelog

Sourced from pymupdf4llm's changelog.

Changes in version 0.0.22

Fixes:

  • 251 - Images a little larger than the page size are being ignored
  • 255 - Single-row/column tables are skipped
  • 258 - Pymupdf4llm to_markdown crashes on some documents

Other Changes:

  • Added class TocHeaders as an alternative way for identifying headers.

Changes in version 0.0.21

Fixes:

  • 116 - Handling Graphical Images & Superscripts

Other Changes:

Changes in version 0.0.20

Fixes:

  • 171 - Text rects overlap with tables and images that should be excluded.
  • 189 - The position of the extracted image is incorrect
  • 238 - When text is laid out around the picture, text extraction is missing.

Other Changes:

  • Added new parameter ignore_images: (bool) optional. True will not consider images in any way. May be useful for pages where a plethora of images prevents meaningful layout analysis. Typical examples are PowerPoint slides and derived / similar pages.

  • Added new parameter ignore_graphics: (bool), optional. True will not consider graphics except for table detection. May be useful for pages where a plethora of vector graphics prevents meaningful layout analysis. Typical examples are PowerPoint slides and derived / similar pages.

  • Added new parameter to class IdentifyHeaders: Use max_levels (integer <= 6) to limit the generation of header tag levels. e.g. headers = pymupdf4llm.IdentifyHeaders(doc, max_level=3) ensures that only up to 3 header levels will ever be generated. Any text with a font size less than the value of ### will be body text. In this case, the markdown generation itself would be coded as md = pymupdf4llm.to_markdown(doc, hdr_info=headers, ...).

  • Changed parameter table_strategy: When specifying None, no effort to detecting tables will be made. This can be useful when tables are of no interest or known to not exist in a given file. This will speed up processing significantly. Be prepared to see more changes and extensions here.

Changes in version 0.0.19

Fixes:

The following list includes fixes made in version 0.0.18 already.

  • 158 - Very long titles when converting to markdown.
  • 155 - Inconsistent image extraction from image-only PDFs
  • 161 - force_text param ignored.
  • 162 - to_markdown isn't outputting all the pages but get_text is.

... (truncated)

Commits
  • c26b2c1 Updates for v0.0.22
  • 5a0679d Version 0.0.21
  • 4e599ed pass load_kwargs to '_process_doc_page' and 'to_markdown' to enable write_ima...
  • 05becd8 Changes Version 0.0.20
  • 8460a9f Remove typos
  • 2ec62ae Addresses multiple issues
  • 3ad7edf Ignore Graphics only
  • 3dd3429 Merge branch 'main' into v0.0.18
  • 7a53eb7 Mutiple Fixes
  • 85d0f57 Sets default cwidth to half the font size
  • Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [pymupdf4llm](https://github.com/pymupdf/RAG) to permit the latest version.
- [Release notes](https://github.com/pymupdf/RAG/releases)
- [Changelog](https://github.com/pymupdf/RAG/blob/main/CHANGES.md)
- [Commits](pymupdf/pymupdf4llm@v0.0.8...v0.0.22)

---
updated-dependencies:
- dependency-name: pymupdf4llm
  dependency-version: 0.0.22
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants