[Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler - ver 2 #31784

DariaMityagina · 2025-08-19T06:56:02Z

Alternative solution: #32350

Details:

The task is to explore the possibility of moving the reshaping process to the plugin. This would allow the compiler to receive either a network with batch size 1 if reshaping is successful, or a network with a non-1 batch size if reshaping fails. This approach aims to simplify the batch handling process and reduce dependencies on the compiler.

Changes are applied to Dynamic batch cases only. Static batch scenario isn't modified.

Flow:

Plugin receives a dynamically batched model
Plugin tries to reshape it to batch=1
If success, plugin invokes the compiler to compile the model with batch=1
Plugin dumps the blob in the FS
===end of context===
A user calls benchmark_app -m model.blob -data_shape [4, 3, 224, 224]
Plugin loads the model onto the device
Plugin sees the batch -> it creates N (4) infer requests

I am now experimenting with preserving info about batch dynamism via tensor name encoding.

The approach:
When a model with dynamic batch dimension needs to be compiled with a fixed batch size (batch=1), we encode the original dynamic nature directly into the tensor names before compilation. Specifically, we append a "DYNBATCH_ORIG" suffix (will think of a better one later) to both input and output tensor names.

How it works:
During compilation phase: before calling ov::set_batch(model, 1) on a target case, we modify all input and output tensor names by appending the marker suffix. The marker survives compilation: tensor names are preserved through the OpenVINO compilation pipeline and remain accessible in the compiled blob.

During reconstruction phase: when creating dummy models from IODescriptors (which contain the preserved tensor names), we check for the presence of the marker to determine if the original model had dynamic batching.

Automatic restoration: if the marker is found (only for target deBached cases), we restore the dynamic batch dimension (ov::Dimension(-1)) in the dummy model's shape.

Key advantages:

No external metadata: the information travels with the model itself through the standard OpenVINO pipeline
Survives serialization: tensor names are part of the core model representation and persist through export/import cycles
Minimal overhead: just a string suffix, no complex data structures

Potential improvement to consider:
🟢 Instead of replacing the original tensor name, we should probably add the marker as an additional name to the tensor's name set. This would preserve the original name in case it's being used elsewhere in the pipeline, while still carrying our dynamic batch information. The tensor would then have both the original name and the marked name, providing better compatibility and reducing the risk of breaking existing name-based lookups.

❓ We could reuse the same approach and save Layout information to avoid checking hardcoded BATCH_AXIS when validating the model in validateModelBatch. Probably it should be done in a separate PR.

🟢 I will open this PR for review once I complete full validation and fix any potential issues

Tickets:

E-176749

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp

src/plugins/intel_npu/src/plugin/src/plugin.cpp

sivanov-work

IMHO, we have too wide spread an unclear batch/dynamic-batch handling logic here, which actually were imposed by a legacy design.

The things I'd be very eager to discuss are:

I think we should remove any opened presumption about N dimension location(even more by getting the batch from hardcoded BATCH_AXIS positions), because it will not work for 100% cases and we will need to return to this problem again and again making more and more refactoring.
Instead I suggest we use layout information (or ask user to supply it if that doesn't exist) to get batch dimension, as this plugin.cpp part was designed to be generic rather than PLUGIN batch emphasized
even though if we get off BATCH_AXIS and retain these "support-batch-case" various check: I see that these limitations are spread across these TWO places: the one part in "plugin.cpp" side where we check PLUGIN mode, and the actual limitation which are contained in ZeroInferRequest where we are checking data structures once again.
IMHO, plugin.cpp should be clean from any assumptions whatever PLUGIN implementation can or cannot support and doesn't introduce such fine-grained conditions testing. Otherwise we would be destined always fix/add/invent symmetrical changes in both ZeroInferRequest and plugin.cpp as well.
If we had agreed to consider each N=1 batched infer request, as a possible batched/dynamic-batched infer request, then it's also possible to release burden from this plugin.cpp part and always use set_batch(1) for any scenario where PLUGIN mode is involved so that later we could reallocate inner data when set_tensor() comes with a new N.
So that it allows us to implement in ZeroInferRequest N=1 as a generic case and only confine set_tensor()/infer() implementation for event when new tensors are set by user which have a different batch value

As I can see:

we are forced to set_batch(1) when we are dealing with batched network as ZeroInferRequests for PLUGIN mode use that. It seems to me that we MUST set_batch(1) every time we deal with a batched network. We cannot invoke set_bacth(1) for each network as it is not reliable and may fail for non-batched network, therefore we must determine that network is a batched network. Despite this entire legacy logic is, IMHO, redundant and weak- I think that we should stop here and do not proceed further for separating static batch/dynamic batch cases and bringing up any artificial limitations more than we already have. So that I hope that simply using set_batch(1) is enough here
ZeroInferRequest deals with N=1 situation (or doesn't even care what it has got) and only add some checking into set_tensor() that new tensor is N times more of less than existing one. If ALL tensors got properly changed in the same proportion N - that we deal with bathced case whatever static/dynamic it is
IMHO ideal situation, would be is to get off all heuristic/limitation from plugin.cpp and encapsulate them all in ZeroInferRequest

sivanov-work · 2025-09-22T08:38:41Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+            const bool batchedModel = ov::get_batch(modelForCompilation) != intel_npu::utils::DEFAULT_BATCH_SIZE;
+
+            if (autoOrPluginBatch && pluginBatchingIsSupported && batchedModel) {
+                _logger.info("Attempting to handle batching on the plugin side.");


tips:
I recommend to extend the logger info by adding existing a current value of batch
.. from {ov::get_batch(modelForCompilation} to "1"

sivanov-work · 2025-09-22T08:41:15Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+                ov::set_batch(modelForCompilation, 1);
+                // TODO: add debatcher for more complicated cases as set_batch is pretty naive.
+            } else {
+                _logger.info("Unable to manage batching on the plugin side, so the compiler will take care of it.");


tips:
a value of the batch in info printing will also help in troubleshooting

src/plugins/intel_npu/src/plugin/src/plugin.cpp

sivanov-work · 2025-09-22T10:01:23Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+
+            updateBatchMode(ov::intel_npu::BatchMode::COMPILER);
+        } catch (const std::exception& ex) {
+            _logger.info("Couldn't validate and reshape the model. Batching will be handed by compiler.", ex.what());


AUTO & PLUGIN makes no difference then, as far I remember from the compiler code we roll back to the COMPILER mode only if we have got AUTO enabled so that we do not overwrite explicitly requested PLUGIN mode

What should happen when PLUGIN mode cannot be applied? Should we fail immediately in this scenario rather than falling back to COMPILER mode?

Discussed here: #31784 (comment)

sivanov-work · 2025-09-22T10:09:53Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+        ov::Layout layout = ov::layout::get_layout(input);
+
+        // Batching on plugin is working only when batching is found on 0th dimension
+        if ((shape.size() && shape[intel_npu::utils::BATCH_AXIS].get_max_length() != intel_npu::utils::DEFAULT_BATCH_SIZE) ||


I don;t think that it's good idea to stick to the static BATCH_AXIS while determining a batch dimension, there are many cases when the 0th dimension is not a batch at all, as well as the cases where batch can be determined not at the 0th position

Since we have "layout" attribute affiliated to a model - we could adhere to it.
On the vpux-compiler side we have this logic in determining batches for the debatch method:

check whether or not user specified "layouts" explicitly

check ov::FindBatch outcome

trying to set_batch(1) ...

what do you think if we reuse similar logic? Or do not introduce any logic at the plugin.cpp level at all

I don't think that it's good idea to stick to the static BATCH_AXIS while determining a batch dimension, there are many cases when the 0th dimension is not a batch at all, as well as the cases where batch can be determined not at the 0th position

Discussed with the team offline. Agreed to improve this in separate PRs.

src/plugins/intel_npu/src/plugin/src/plugin.cpp

pereanub

I still don't think that considering all the models that have batch size set to 1 as dynamic models is the correct approach here.

src/plugins/intel_npu/src/common/src/sync_infer_request.cpp

DariaMityagina · 2025-09-23T11:58:40Z

I still don't think that considering all the models that have batch size set to 1 as dynamic models is the correct approach here.

This approach was inspired by the requirements for this task:

1. Plugin detection: The plugin identifies a network with a dynamic batch size.
2. Batch conversion: It transforms the network into one with a batch size of 1, if possible.
3. Driver compilation: Only after this transformation does the network get passed to the driver for compilation.
4. Isolation from Compiler and Driver: Crucially, neither the compiler nor the driver is aware that the network originally had a dynamic/static batch size.

- To make dynamic batch compatible with older drivers
- Needed compatibility with old NPU Plugins
- [Stretch goal] No need for Metadata in compiled blobs, NPU Plugin to handle batches != 1 automatically using concurrently created infer requests

To minimize the risks, I will try to limit the application of current changes to the dynamic batch for now, after which we can work on a common solution.

pereanub · 2025-10-01T11:50:11Z

Changes look good to me but @PatrikStepan @razvanapetroaie please have a look.

DariaMityagina · 2025-10-01T16:29:04Z

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp

                                                   const std::optional<size_t> batchSize) {
+    // Check if tensor was originally dynamic by looking for encoded markers
+    // This information is needed to restore the original dynamic batching behavior
+    auto wasOriginallyDynamic = [](const std::unordered_set<std::string>& tensorNames) -> bool {


Also used in plugin.cpp. Maybe we should move this to a separate function to avoid duplication? TBD.

Done, 9957766

DariaMityagina · 2025-10-01T16:30:08Z

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp

-    }
-
-    return std::nullopt;
+    return tensor->get_shape()[intel_npu::utils::BATCH_AXIS];


wasDynamic == true is equivalent to origShape[intel_npu::utils::BATCH_AXIS].is_dynamic()

DariaMityagina · 2025-10-01T19:45:26Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

-        if (localConfig.get<BATCH_MODE>() == ov::intel_npu::BatchMode::PLUGIN) {
-            OPENVINO_THROW("This model contains states, thus it is not supported when handling batching on the plugin");
+    // Handle batch mode configuration
+    if (localConfig.isAvailable(ov::intel_npu::batch_mode.name())) {


Small refactoring - putting all if (localConfig.isAvailable(ov::intel_npu::batch_mode.name()) .. under one condition

Calling handleDynamicBatching

DariaMityagina · 2025-10-01T19:47:36Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+
+        // Batching on plugin is working only when batching is found on 0th dimension
+        if ((shape.size() &&
+             shape[intel_npu::utils::BATCH_AXIS].get_max_length() != intel_npu::utils::DEFAULT_BATCH_SIZE) ||


We could reuse the same approach and save Layout information to avoid checking hardcoded BATCH_AXIS when validating the model in validateModelBatch.

Agreed that if we were to do it we would so it in a separate PR.

DariaMityagina · 2025-10-02T11:09:18Z

@PatrikStepan, @razvanapetroaie, @pereanub, @sivanov-work could you please review this PR?

sivanov-work · 2025-10-02T14:24:34Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+    // Encode info in input tensor names
+    for (auto& input : model->inputs()) {
+        const std::string originalName = input.get_any_name();
+        input.get_tensor().set_names({originalName, originalName + suffix});


why don't we just serialize PartialShape into a string by using PartialShape::to_string()? It can be restored easily by calling PartialShape(string) ctor.
PartialShape give us more information regarding tensors, so that instead of getting "a flag", which has not 100% probability for appearing naturally as a regular input name, we will have a full shape reconstruction so that we won't not need for shapeFromCompiler and other peculiar stuff

There are already dedicated metadata fields (shapeFromIR, shapeFromCompiler) for PartialShape storage.

I'd prefer not to duplicate this data in tensor names, especially since the only missing information in PLUGIN mode is the batch dimension.

Our current problem is quite specific - we need to preserve only the knowledge that a batch dimension was originally dynamic, not reconstruct the entire shape from scratch. The existing metadata already provides complete shape information; we just need a lightweight flag to indicate the dynamic batch nature.

We will not duplicate that data as shapeFromIR, shapeFromCompiler lose that information once set_batch(1) is being called on initially batched model, which makes the compiler, who gathers that metadata, ignorant regarding any dynamism in N in an original model, which makes OV/Plugin the only informational expert here

The point is that we have a chance to tame a holistic approach rather than relying on different "flags" spread across all codebase

Our current problem is quite specific - we need to preserve only the knowledge that a batch dimension was originally dynamic

Agree, and we can do it by storing an original shape.

The existing metadata already provides complete shape information

Apparently not, as we lose it in shapeFromIR, shapeFromCompiler after set_batch(1)

we just need a lightweight flag to indicate the dynamic batch nature.

Gently disagree, what we actually need is: we need to preserve only the knowledge that a batch dimension was originally dynamic . The light flag is one of the implementations among other variants. We can get that info from the original shape easily, simultaneously providing more flexibility and we may get out of shapeFromCompiler eventually

Ultimately It's up to you, but IMHO I see that preserving shapes and layouts in names is a more holistic and generic approach rather than existing flags+metadata combination, which allows us to do not make refactoring once again if we decide to go with the idea of "pipeline optimization based on bounds info" later

sivanov-work · 2025-10-02T14:28:35Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+    const std::string suffix = intel_npu::utils::DYNBATCH_SUFFIX;
+
+    // Encode info in input tensor names
+    for (auto& input : model->inputs()) {


Actually before doing that, I think it would be good idea to ensure that a model is dynamic. At least assert should be put here, that makes this function suitable for unit testing. Right now - it's content-dependent

encodeDynamicBatchInfo is called in handleDynamicBatching, which has this check in the very beginning:

void Plugin::handleDynamicBatching(std::shared_ptr<ov::Model>& modelForCompilation, Config& localConfig, const std::function<void(ov::intel_npu::BatchMode)>& updateBatchMode) const { // Avoiding risks with static models. TODO: common solution. if (!modelForCompilation->is_dynamic()) { return; } ...

so this encodeDynamicBatchInfo depends on a precondition checked in handleDynamicBatching, so that it highly coupled to it implicitly. During future refactoring someone may want to reuse this function, and that precondition might got broken and entire behavior may get unstable.

I suggest either decouple this function or add a sanity check to make sure that it does not transform a static model to a model with dynamic prefixes.

Done, thanks!

sivanov-work · 2025-10-02T14:31:24Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+        const std::string originalName = output.get_any_name();
+        output.get_tensor().set_names({originalName, originalName + suffix});
+    }
+}


If we just put PartialShape into tensor names set unconditionally, without adding index DYNBATCH_SUFFIX and without checking that the model is a dynamic one - we could just give a name to this function like EncodeModelTensorMetadata (or something else) which IMHO will make this function more abstract and holistic

sivanov-work · 2025-10-02T14:34:25Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+                                   Config& localConfig,
+                                   const std::function<void(ov::intel_npu::BatchMode)>& updateBatchMode) const {
+    // Avoiding risks with static models. TODO: common solution.
+    if (!modelForCompilation->is_dynamic()) {


If we encode PartialShape into tensors names, the function encodeDynamicBatchInfo/EncodeModelTensorMetadata will become context-independent and it seems we will be able to resolve this TODO

sivanov-work · 2025-10-02T14:38:36Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+                            ex.what());
+
+            deBatchModel(modelForCompilation, ov::Dimension(1));
+            if (!modelForCompilation) {


how we can get nullptr after deBatchModel() ? I guess that it also may throw an exception.
probably it would be better to catch the exception inside deBatchModel, which should return boolean or the same shared_ptr on a debatched model

Done, thanks!

sivanov-work · 2025-10-02T14:39:59Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+
+            deBatchModel(modelForCompilation, ov::Dimension(1));
+            if (!modelForCompilation) {
+                OPENVINO_THROW("Cannot debatch a model");


I could be wrong, I think we should procced with the COMPILER path provided that AUTO mode has been specified

We will catch this THROW here: line 763

_logger.info("Couldn't validate and reshape the model. Batching will be handled by compiler. Error: %s", ex.what()); updateBatchMode(ov::intel_npu::BatchMode::COMPILER);

And set batch mode to COMPILER.

Trying to debatch it... [INFO] 12:19:33.452 [NPUPlugin] Couldn't validate and reshape the model. Batching will be handled by compiler. Error: Exception from src\plugins\intel_npu\src\plugin\src\plugin.cpp:756: Cannot debatch a model [INFO] 12:19:33.452 [NPUPlugin] Setting batching mode to COMPILER.

Are you suggesting that fallback to COMPILER mode here should only occur when AUTO mode was initially selected?
And when PLUGIN mode is explicitly set, it should fail rather than attempt fall back?

as default mode is AUTO, then a user usually doesn't care how a model will be processed. it depends on the implementation and according to our implementation (legacy) we try PLUGIN mode at first and switch to the COMPILER only if it fails.

If the user specifies non-default value as BATCH_MODE, then it is their clear intention and reputedly we should not ignore their choice. Otherwise they waste a lot of time trying to figure out why the results aren't what they expected

And when PLUGIN mode is explicitly set, it should fail rather than attempt fall back?

and vice versa for COMPILER

Makes sense, thanks a lot for clarifications!
I will apply this comment to the logic to ensure batch mode modifications occur only when AUTO mode is specified

Done, thanks!

sivanov-work · 2025-10-02T14:45:47Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+        if (!model->get_variables().empty()) {
+            if (localConfig.get<BATCH_MODE>() == ov::intel_npu::BatchMode::PLUGIN) {
+                OPENVINO_THROW(
+                    "This model contains states, thus it is not supported when handling batching on the plugin");


What if we have the combination STATES=true, BATCH_MODE=PLUGIN, no batch in a model?

perhaps this check must be placed after validateModelBatch()

OV documentation batching-on-npu-plugin

OpenVINO™ NPU plugin supports batching either by executing concurrent inferences or by relying on native compiler support for batching.
First, the NPU plugin checks if the following conditions are met:

The batch size is on the first axis.

All inputs and outputs have the same batch size.

The model does not contain states.

perhaps this check must be placed after validateModelBatch()

Sorry, how so? In case the model contains states, we won't be able to apply PLUGIN batch, therefore we won't even go to validateModelBatch().

If I understand the algorithm correctly, it will throws the exception when the following conditions are met as well:
a) BATCH_MODE= PLUGIN
b) a model has STATES
c) a model is not a batched model (it may be even static and not batched)

That is the case I concerned about: why would we fail on non-batched model which has states?

I see what you mean now, will look into it.
Though my understanding is that PLUGIN mode isn't typically used with non-batched models.

It's an implicit default mode, I suppose it's natural to expect that model with N=1 will be pass through both BATCH_MODE PLUGIN and COMPILER intact, as we have only two deterministic modes as AUTO: PLUGIN and BATCH and we don't have NO_BATCH.

Otherwise it's strange if a non-batched model would be compiled using AUTO and wouldn't be compiled using PLUGIN as technically PLUGIN is the first approach of AUTO (PLUGIN mode still use the same ZeroInferRequest and other primitives even if N=1)

sivanov-work · 2025-10-02T14:48:01Z

src/plugins/intel_npu/src/utils/include/intel_npu/utils/utils.hpp

+// This information is needed to restore the original dynamic batching behavior
+static inline bool wasOriginallyDynamic(const std::unordered_set<std::string>& tensorNames) {
+    for (const auto& name : tensorNames) {
+        if (name.find(DYNBATCH_SUFFIX) != std::string::npos) {


if we add prefix to a second name only, probably we should skip an original name iterator

DariaMityagina · 2025-10-08T06:11:35Z

@PatrikStepan, @razvanapetroaie, @pereanub could you please review this PR?

sivanov-work

LGTM, although there are some points that would be interesting to explore in more detail as future possibilities:

we may want to get off validateModelBatch since it only emphasizes a constraint and prints a pretty error regarding plugin mode "cannot ... and model must be ...", while the actual constraint is in ZeroinferRequest and only this component should decide whether it will be able to infer that model or not. This decision may depends on a result of whether the pair of function set_batch(1) || debatchModel() succeeds or fails, thus there might be chances that this validateModel might be completely substituted by this pair called unconditionally regardless of whether an actual model is batched/dynamic
I'd suggest serialize a PartisalShape and Layout info as an additional tensor names in this PR (instead of using a flag) to do not return to this code in the Plugin.cpp once again, so that all future additional logic will be contained in ZeroInferRequest.cpp. That particularly will allow us to get off all shapeFrom*** constructs and give more flexibility in making decision whether a model can be inferred (or better inferred) on ZeroInferRequest side.
Given those changes, the only encodeDynamicBatchInfo responsibility will be to store PartialShapes & Layout in tensor names, which can be called for any model and be renamed to encodeTensorInfo

Having those applied, we may significantly reduce the amount of batch/dynamic processing code in this Plugin.cpp as for now it has a lot of batch-related responsibilities

sivanov-work · 2025-10-08T07:59:46Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp


+    // Sanity check: ensure we don't transform static models
+    if (!model->is_dynamic()) {
+        _logger.warning("Attempting to encode dynamic batch info on a static model. Skipping encoding.");


thanks for adding the check!

though I'd lower the logger severity from warning to debug, at least, or remove it completely - because IMHO "warning" level imply some actions from the user to be made but nothing they can do here.

Done, thanks!

DariaMityagina · 2025-10-08T01:19:39Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+    const std::string suffix = intel_npu::utils::DYNBATCH_SUFFIX;
+
+    // Encode info in input tensor names
+    for (auto& input : model->inputs()) {


Done, thanks!

DariaMityagina · 2025-10-08T01:20:00Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+                            ex.what());
+
+            deBatchModel(modelForCompilation, ov::Dimension(1));
+            if (!modelForCompilation) {


Done, thanks!

DariaMityagina · 2025-10-08T12:05:53Z

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp

+    auto wasDynamic = intel_npu::utils::wasOriginallyDynamic(desc.outputTensorNames);
+    if (!wasDynamic) {


Compatibility:

Suggested change

auto wasDynamic = intel_npu::utils::wasOriginallyDynamic(desc.outputTensorNames);

if (!wasDynamic) {

auto wasDynamic = intel_npu::utils::wasOriginallyDynamic(desc.outputTensorNames);

auto dynamicBatchFromIR = desc.shapeFromIRModel.has_value() && (*desc.shapeFromIRModel)[intel_npu::utils::BATCH_AXIS].is_dynamic();

if (!wasDynamic && !dynamicBatchFromIR ) {

DariaMityagina · 2025-10-08T13:52:48Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp


+    // Sanity check: ensure we don't transform static models
+    if (!model->is_dynamic()) {
+        _logger.warning("Attempting to encode dynamic batch info on a static model. Skipping encoding.");


Done, thanks!

DariaMityagina · 2025-10-09T09:52:35Z

@PatrikStepan, @razvanapetroaie, @pereanub could you please review this PR?

DariaMityagina · 2025-10-09T11:03:00Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+            if (ov::layout::has_batch(layout)) {
+                partShape[ov::layout::batch_idx(layout)] = newBatch;
+            }


Suggested change

if (ov::layout::has_batch(layout)) {

partShape[ov::layout::batch_idx(layout)] = newBatch;

}

if (ov::layout::has_batch(layout)) {

partShape[ov::layout::batch_idx(layout)] = newBatch;

} else {

partShape[intel_npu::utils::BATCH_AXIS] = newBatch;

}

…and Compiler - no metadata changes

…and Compiler - no metadata changes - fix static tests

…and Compiler - fix BA issues - treat every model with batch 1 as a potentially dynamically batched one

…and Compiler - validateModelBatch conditions

…and Compiler - dynamic dims limitation

…and Compiler - additional checks

…and Compiler - simplify

…and Compiler - will be prettier, functionality first

…and Compiler - clean up

…and Compiler - fix tests

…and Compiler - review comments

…and Compiler - build warning fixes

…and Compiler - compatibility with older blobs

PatrikStepan · 2025-10-10T10:23:47Z

By adding markers to original input names we are actually storing plugin owned information into the blob generated by the compiler, but without adding a clear API for this or an entry in the blob metadata. This is different than all the other name conventions we already have for states, shapes and weights-as-input. In all the other cases it is the compiler that transforms the model and and adds extra inputs/outputs to the model. Driver guarantees backward/forward compatibility for all these.

Embedding the (dynamic) batch information in input names becomes problematic when we attempt to maintain blob compatibility among multiple OpenVINO versions. In case we ever decide to change the convention and reject the blobs we generate today, plugin will have to first send the blob to the driver for parsing then try to match inputs with predefined strings. Only to later reject the blob if they have the old prefix. Storing this information in a container that plugin can interpret without parsing the entire blob is preferable.

The prefix we add to the input name is nothing else than an unscalable container for the batch information we need to store. We already know it is enough only for dynamic batch but not for a static batch. A separate versioned container would allow us to extend this information in the future.

My proposal is to proceed with the alternative solution where we store the information we need in the metadata that is owned by the plugin and appended to the compiler blob. It is already versioned and it will avoid another naming convention we will have to maintain.

DariaMityagina self-assigned this Aug 19, 2025

DariaMityagina added the category: NPU OpenVINO NPU plugin label Aug 19, 2025

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 30529c1 to 29bbcdc Compare August 19, 2025 07:01

DariaMityagina commented Aug 19, 2025

View reviewed changes

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp Show resolved Hide resolved

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch 3 times, most recently from 6f2a537 to b9cbb0f Compare September 9, 2025 23:00

DariaMityagina commented Sep 10, 2025

View reviewed changes

src/plugins/intel_npu/src/plugin/src/plugin.cpp Outdated Show resolved Hide resolved

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 105c432 to 3a4baa2 Compare September 10, 2025 11:51

DariaMityagina commented Sep 17, 2025

View reviewed changes

src/plugins/intel_npu/src/plugin/src/plugin.cpp Outdated Show resolved Hide resolved

DariaMityagina commented Sep 17, 2025

View reviewed changes

src/plugins/intel_npu/src/plugin/src/plugin.cpp Outdated Show resolved Hide resolved

DariaMityagina added this to the 2025.4 milestone Sep 22, 2025

sivanov-work reviewed Sep 22, 2025

View reviewed changes

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from e932c8d to ef91465 Compare September 22, 2025 22:04

pereanub reviewed Sep 23, 2025

View reviewed changes

src/plugins/intel_npu/src/common/src/sync_infer_request.cpp Outdated Show resolved Hide resolved

src/plugins/intel_npu/src/common/src/sync_infer_request.cpp Outdated Show resolved Hide resolved

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch 6 times, most recently from 696966e to 3f6ecd7 Compare September 30, 2025 08:44

DariaMityagina requested a review from pereanub October 1, 2025 11:21

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 3f6ecd7 to 547a304 Compare October 1, 2025 16:15

DariaMityagina commented Oct 1, 2025

View reviewed changes

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 9957766 to c8ed675 Compare October 1, 2025 20:11

DariaMityagina marked this pull request as ready for review October 2, 2025 11:07

DariaMityagina requested review from a team as code owners October 2, 2025 11:07

sivanov-work reviewed Oct 2, 2025

View reviewed changes

sivanov-work approved these changes Oct 8, 2025

View reviewed changes

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from d4dc299 to e6c636b Compare October 8, 2025 17:24

DariaMityagina commented Oct 8, 2025

View reviewed changes

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from e6c636b to 1054763 Compare October 8, 2025 19:21

DariaMityagina commented Oct 9, 2025

View reviewed changes

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 1054763 to 8a0031d Compare October 9, 2025 11:09

DariaMityagina added 13 commits October 9, 2025 11:12

Investigate refactoring opportunities for batch management in Plugin …

4f84cc6

…and Compiler - no metadata changes

Investigate refactoring opportunities for batch management in Plugin …

848c10e

…and Compiler - no metadata changes - fix static tests

Investigate refactoring opportunities for batch management in Plugin …

e27e9d7

…and Compiler - fix BA issues - treat every model with batch 1 as a potentially dynamically batched one

Investigate refactoring opportunities for batch management in Plugin …

0709660

…and Compiler - validateModelBatch conditions

Investigate refactoring opportunities for batch management in Plugin …

7764f82

…and Compiler - dynamic dims limitation

Investigate refactoring opportunities for batch management in Plugin …

1fac219

…and Compiler - additional checks

Investigate refactoring opportunities for batch management in Plugin …

8225ce2

…and Compiler - simplify

Investigate refactoring opportunities for batch management in Plugin …

d8cb1e1

…and Compiler - will be prettier, functionality first

Investigate refactoring opportunities for batch management in Plugin …

1accede

…and Compiler - clean up

Investigate refactoring opportunities for batch management in Plugin …

6bee60c

…and Compiler - clean up

Investigate refactoring opportunities for batch management in Plugin …

6eac2da

…and Compiler - fix tests

Investigate refactoring opportunities for batch management in Plugin …

47918ac

…and Compiler - review comments

Investigate refactoring opportunities for batch management in Plugin …

e928525

…and Compiler - build warning fixes

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 8a0031d to 85950b2 Compare October 10, 2025 01:25

Investigate refactoring opportunities for batch management in Plugin …

85950b2

…and Compiler - compatibility with older blobs

		auto wasDynamic = intel_npu::utils::wasOriginallyDynamic(desc.outputTensorNames);
		if (!wasDynamic) {

[Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler - ver 2 #31784

Are you sure you want to change the base?

[Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler - ver 2 #31784

Conversation

DariaMityagina commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sivanov-work left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pereanub left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DariaMityagina commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pereanub commented Oct 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DariaMityagina Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DariaMityagina commented Oct 2, 2025

Uh oh!

sivanov-work Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sivanov-work Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DariaMityagina commented Aug 19, 2025 •

edited

Loading

sivanov-work left a comment •

edited

Loading

DariaMityagina commented Sep 23, 2025 •

edited

Loading

DariaMityagina Oct 1, 2025 •

edited

Loading

sivanov-work Oct 2, 2025 •

edited

Loading

sivanov-work Oct 6, 2025 •

edited

Loading

DariaMityagina Oct 6, 2025 •

edited

Loading