Skip to content

Conversation

PeterStaar-IBM
Copy link
Contributor

@PeterStaar-IBM PeterStaar-IBM commented Jul 8, 2025

Idea: provide a new approach to run a two stage approach: first the layout-model on page-images and then use the predicted clusters as input (through an updated prompt) for the VLM. This approach was first explored Dolphin (be that with the same VLM).

The hope here is to improve the overall recall for VLM's.

Can be tested via the CLI,

uv run docling --pipeline vlm --vlm-model vlm2stage <path-to-doc>

TBD: how do we properly build a prompt that the VLM can use!

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Copy link
Contributor

github-actions bot commented Jul 8, 2025

DCO Check Passed

Thanks @PeterStaar-IBM, all your commits are properly signed off. 🎉

Copy link

mergify bot commented Jul 8, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Enforce conventional commit

This rule is failing.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

Copy link

codecov bot commented Jul 10, 2025

@PeterStaar-IBM PeterStaar-IBM marked this pull request as ready for review July 10, 2025 14:20
@PeterStaar-IBM
Copy link
Contributor Author

@dolfim-ibm @cau-git Let's start to review, functionally working, but needs now some overall thinking.

@PeterStaar-IBM
Copy link
Contributor Author

close for now, superseded by #2084

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants