Skip to content

Conversation

DariaMityagina
Copy link
Contributor

@DariaMityagina DariaMityagina commented Oct 10, 2025

Details:

The task is to explore the possibility of moving the reshaping process to the plugin. This would allow the compiler to receive either a network with batch size 1 if reshaping is successful, or a network with a non-1 batch size if reshaping fails. This approach aims to simplify the batch handling process and reduce dependencies on the compiler.

Flow:

  1. Plugin receives a dynamically batched model
  2. Plugin tries to reshape it to batch=1
  3. If success, plugin invokes the compiler to compile the model with batch=1
  4. Plugin dumps the blob in the FS
    ===end of context===
  5. A user calls benchmark_app -m model.blob -data_shape [4, 3, 224, 224]
  6. Plugin loads the model onto the device
  7. Plugin sees the batch -> it creates N (4) infer requests

Tickets:

  • E-176749

Metadata<CURRENT_METADATA_VERSION>(blobSizesBeforeVersioning,
CURRENT_OPENVINO_VERSION,
initBlobSizes,
originalBatchSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass here information from the CompiledModel itself, not from the _graph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!
93181ac

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
originalBatchSize)
_batchSize)

@DariaMityagina DariaMityagina changed the title [Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler [Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler - new metadata version Oct 10, 2025
@DariaMityagina DariaMityagina marked this pull request as ready for review October 10, 2025 09:47
Copy link
Contributor

@sivanov-work sivanov-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

@DariaMityagina DariaMityagina added this to the 2025.4 milestone Oct 10, 2025
@DariaMityagina DariaMityagina self-assigned this Oct 10, 2025
@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-metadata-used branch from 3d11a35 to 93181ac Compare October 13, 2025 07:18
@DariaMityagina
Copy link
Contributor Author

@PatrikStepan hi! Could you please take another look at this PR?

@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-metadata-used branch 2 times, most recently from 61e0dd0 to b5a2daa Compare October 13, 2025 10:57
Comment on lines 764 to 766
// If we have successfully debatched the model on the PLUGIN side, we should
// avoid repeating the same in the compiler by resetting the batch mode
updateBatchMode(ov::intel_npu::BatchMode::COMPILER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DariaMityagina @pereanub what if we remove BATCH_MODE from the config here? From the compiler source code, it seems to have the same effect.
BATCH_MODE is still a private property, it is added today only by the plugin itself in the list of compiler configs. With changes from this PR maybe it should be added only when the fallback on the compiler implementation is required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With changes from this PR maybe it should be added only when the fallback on the compiler implementation is required?

That's exactly what's happening here, let's take a closer look.

If BATCH_MODE = AUTO or PLUGIN:

  • Verify if the model is compatible with batching on the plugin side
  • Attempt to reshape the model using set_batch(1)
  • If this is successful, we continue with the PLUGIN BATCH_MODE
    • Write/Read batch size into NPU plugin metadata
    • The reshaped model is sent to the compiler. BATCH_MODE must be set to COMPILER
    • Adjust the metadata post-compilation to retain the original batch value
  • If it isn't successful:
    • If BATCH_MODE = AUTO: we fallback to BATCH_MODE =COMPILER
    • If BATCH_MODE = PLUGIN: we fail

Copy link
Contributor

@PatrikStepan PatrikStepan Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was challenging this part: "BATCH_MODE must be set to COMPILER". If BATCH_MODE is not found in the config, the compiler will attempt to apply its own batching mechanism anyway.
Can BATCH_MODE remain a hint only for the plugin?

  • If set to PLUGIN: try to debatch the model and pass a modified model to the compiler without any extra configs if debatching succeeds, throw otherwise
  • if set to AUTO or if not set at all: try to debatch the model and pass a modified model to the compiler without any extra configs if debatching succeeds, fallback on the compiler implementation otherwise
  • If set to COMPILER or fallback to compiler implementation: pass the original model to the compiler without any extra configs

Copy link
Contributor Author

@DariaMityagina DariaMityagina Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I removed the BATCH_MODE must be set to COMPILER requirement after successful debatching (here fb6363b).

However, I think it still makes sense to pass the config value for the remaining cases to avoid duplicating the plugin-side validation logic in the compiler. This way we can directly use the updated value from the plugin instead of re-implementing/repeating the same checks just to set BATCH_MODE.

At least in the current state. We can adjust this implementation later through a dedicated task with focused improvements.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPU_BATCH_MODE is now a Runtime option: 977e1b1

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not applicable anymore.


// Handle batch mode configuration
std::optional<ov::Dimension> originalBatch = std::nullopt;
if (localConfig.isAvailable(ov::intel_npu::batch_mode.name())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BATCH_MODE is not available when compiler does not support this option. But batching could still be handled on the plugin side (in some cases at least). Isn't that the actual requirement that needs to be addressed in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-metadata-used branch 4 times, most recently from 977e1b1 to 081c268 Compare October 15, 2025 02:13
Copy link
Contributor

@pereanub pereanub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not very clear to me how you fix today fix batch size case.
E.g.: batch size of I/O of the network is not dynamic but fix to 3x... How is this supposed to work today?

@DariaMityagina
Copy link
Contributor Author

DariaMityagina commented Oct 15, 2025

Is not very clear to me how you fix today fix batch size case. E.g.: batch size of I/O of the network is not dynamic but fix to 3x... How is this supposed to work today?

@pereanub, static batch size is fixed here:

Instead of:

So, when batch=3 (static) is used, e.g.:

  • We run validation checks to determine if the model is compatible with plugin batching
  • We identify the batch_size, which equals 3 in this scenario
  • We call graph->set_batch_size(batchSize.value());
  • Within zero_infer_request, we call _graph->get_batch_size() (this returns the previously set value of 3)

@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-metadata-used branch from f33bf5e to 095954f Compare October 15, 2025 17:09
…and Compiler - review (remove setting batch_mode to COMPILER after debatching, private methods, tests)
…and Compiler - review (helpers to a separate file, remove get_batch_size for compiledModel, BATCH_MODE is a runtime option)
…and Compiler - review (BATCH_MODE is needed, tests are not skipped)
…and Compiler - review (if compiler doesn't support BatchMode)
… Plugin and Compiler - review (batchSize type)"

This reverts commit 095954f.
…and Compiler - review (if compiler doesn't support BatchMode) - fixes
…and Compiler - review (use partial shape instead of _compiledModel, coverity)
…and Compiler - review (protect original model, minor comments)
@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-metadata-used branch from f852a99 to a16d4d8 Compare October 19, 2025 11:45
@DariaMityagina
Copy link
Contributor Author

@PatrikStepan @pereanub, @alexandruenache1111 hi! Could you please take another look at this PR? All comments have been addressed.

@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-metadata-used branch 2 times, most recently from c3498e4 to 6d228f7 Compare October 19, 2025 20:22
Copy link
Contributor

@alexandruenache1111 alexandruenache1111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the metadata side.

@DariaMityagina
Copy link
Contributor Author

@razvanapetroaie hi! Could you please take another look at this PR? We would like to merge this today.

Copy link
Contributor

@razvanapetroaie razvanapetroaie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing important imo. I can handle the nits in one of my PRs, so you won't have to rerun the CI jobs just for this.

/**
* @details The number of init schedules, along with the size of each init binary object are read in addition to the
* information provided by the previous metadata versions.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is wrong, this version is adding the batch size value, nothing related to init schedules. Same about the write function. You may leave this without comments if you wish, I think the description of the class makes this obvious enough.

Metadata<CURRENT_METADATA_VERSION>(blobSizesBeforeVersioning,
CURRENT_OPENVINO_VERSION,
initBlobSizes,
originalBatchSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
originalBatchSize)
_batchSize)

const std::shared_ptr<IGraph>& graph,
const FilteredConfig& config);
const FilteredConfig& config,
const std::optional<int64_t>& batchSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a comment in the doxygen passage above on why we ended up passing this value separately. One would expect the batch size would be visible in the ov::Model object.

: inputDescriptor.shapeFromCompiler;

if (batchSize.has_value()) {
shape[intel_npu::utils::BATCH_AXIS] = ov::Dimension(batchSize.value());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential problem that might occur here:
This shape is supposed to be the exact shape used by the original ov::Model object (before compilation). This convention allows using I/O descriptors of the ov::Model as identifiers for the CompiledModel and its derived InferenceRequest. The shape should also match in order to successfully identify I/Os.

If I interpret the code correctly, if the original model used a bounded interval as the batch axis value, you are overriding it using -1 (this line), so it doesn't match anymore. Now, maybe bounded values for the batch axis don't make much sense if adopting a ML perspective. And we are also having the same mismatching issue nowadays, since shapeFromIRModel is not storing dynamic stuff. So maybe this issue is not really relevent.

@DariaMityagina DariaMityagina added this pull request to the merge queue Oct 20, 2025
Merged via the queue into openvinotoolkit:master with commit 6872b07 Oct 20, 2025
178 checks passed
@DariaMityagina DariaMityagina deleted the icv/dm/plugin_batch-metadata-used branch October 20, 2025 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin Code Freeze

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants