Dev/add two stage vlm #1908

PeterStaar-IBM · 2025-07-08T07:50:19Z

Idea: provide a new approach to run a two stage approach: first the layout-model on page-images and then use the predicted clusters as input (through an updated prompt) for the VLM. This approach was first explored Dolphin (be that with the same VLM).

The hope here is to improve the overall recall for VLM's.

Can be tested via the CLI,

uv run docling --pipeline vlm --vlm-model vlm2stage <path-to-doc>

TBD: how do we properly build a prompt that the VLM can use!

Checklist:

Documentation has been updated, if necessary.
Examples have been added, if necessary.
Tests have been added, if necessary.

Signed-off-by: Peter Staar <[email protected]>

github-actions · 2025-07-08T07:50:31Z

✅ DCO Check Passed

Thanks @PeterStaar-IBM, all your commits are properly signed off. 🎉

mergify · 2025-07-08T07:50:54Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Enforce conventional commit

This rule is failing.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

#approved-reviews-by >= 2

Signed-off-by: Christoph Auer <[email protected]>

Signed-off-by: Peter Staar <[email protected]>

codecov · 2025-07-10T13:30:28Z

Codecov Report

❌ Patch coverage is 45.40816% with 107 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
docling/models/vlm_models_inline/mlx_model.py	13.15%	33 Missing ⚠️
...ng/models/vlm_models_inline/two_stage_vlm_model.py	38.46%	32 Missing ⚠️
.../models/vlm_models_inline/hf_transformers_model.py	20.68%	23 Missing ⚠️
docling/pipeline/vlm_pipeline.py	20.00%	12 Missing ⚠️
docling/models/base_model.py	77.77%	4 Missing ⚠️
docling/cli/main.py	0.00%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Signed-off-by: Peter Staar <[email protected]>

…leverage bounding boxes Signed-off-by: Peter Staar <[email protected]>

PeterStaar-IBM · 2025-07-10T14:21:24Z

@dolfim-ibm @cau-git Let's start to review, functionally working, but needs now some overall thinking.

PeterStaar-IBM · 2025-08-18T08:07:30Z

close for now, superseded by #2084

PeterStaar-IBM added 2 commits July 8, 2025 07:38

feat: add TwoStageVlmModel

4eceefa

Signed-off-by: Peter Staar <[email protected]>

feat: working on a two stage VLM model

810446c

Signed-off-by: Peter Staar <[email protected]>

PeterStaar-IBM requested review from cau-git, dolfim-ibm and nikos-livathinos July 8, 2025 07:50

PeterStaar-IBM self-assigned this Jul 8, 2025

cau-git and others added 12 commits July 8, 2025 10:23

Establish layout_model spec and example instantations

f2094f8

Signed-off-by: Christoph Auer <[email protected]>

Move to pipeline_options.layout_options.model

af0461e

Signed-off-by: Christoph Auer <[email protected]>

Updated naming

517230b

Signed-off-by: Christoph Auer <[email protected]>

merged in layout-model-spec

49e9a00

Signed-off-by: Peter Staar <[email protected]>

working on MyPy

b5479ab

Signed-off-by: Peter Staar <[email protected]>

refactoring redundant code and fixing mypy errors

c10e292

Signed-off-by: Peter Staar <[email protected]>

fixed the MyPy complaining

dcf6fd6

Signed-off-by: Peter Staar <[email protected]>

refactored the code and added vlm2stage as a cli option

0f39568

Signed-off-by: Peter Staar <[email protected]>

Merge branch 'main' into dev/add-two-stage-vlm

e596143

merged with main and refactored the code to fix MyPy

70872e6

Signed-off-by: Peter Staar <[email protected]>

fixed the circular dependenciea

b233683

Signed-off-by: Peter Staar <[email protected]>

working TwoStageVlmModel

fb74d0c

Signed-off-by: Peter Staar <[email protected]>

PeterStaar-IBM added 2 commits July 10, 2025 15:38

working two-stage vlm approach from the cli

b2d5c78

Signed-off-by: Peter Staar <[email protected]>

functional working two-stage, need to implement a good prompt now to …

f4c1836

…leverage bounding boxes Signed-off-by: Peter Staar <[email protected]>

PeterStaar-IBM marked this pull request as ready for review July 10, 2025 14:20

PeterStaar-IBM closed this Aug 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dev/add two stage vlm #1908

Dev/add two stage vlm #1908

Uh oh!

PeterStaar-IBM commented Jul 8, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 8, 2025 •

edited

Loading

Uh oh!

mergify bot commented Jul 8, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 10, 2025 •

edited

Loading

Uh oh!

PeterStaar-IBM commented Jul 10, 2025

Uh oh!

PeterStaar-IBM commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dev/add two stage vlm #1908

Dev/add two stage vlm #1908

Uh oh!

Conversation

PeterStaar-IBM commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 Enforce conventional commit

🔴 Require two reviewer for test updates

Uh oh!

codecov bot commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PeterStaar-IBM commented Jul 10, 2025

Uh oh!

PeterStaar-IBM commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PeterStaar-IBM commented Jul 8, 2025 •

edited

Loading

github-actions bot commented Jul 8, 2025 •

edited

Loading

mergify bot commented Jul 8, 2025 •

edited

Loading

codecov bot commented Jul 10, 2025 •

edited

Loading