Skip to content

Conversation

DariaMityagina
Copy link
Contributor

@DariaMityagina DariaMityagina commented Aug 19, 2025

Alternative solution: #32350

Details:

The task is to explore the possibility of moving the reshaping process to the plugin. This would allow the compiler to receive either a network with batch size 1 if reshaping is successful, or a network with a non-1 batch size if reshaping fails. This approach aims to simplify the batch handling process and reduce dependencies on the compiler.

Changes are applied to Dynamic batch cases only. Static batch scenario isn't modified.

Flow:

  1. Plugin receives a dynamically batched model
  2. Plugin tries to reshape it to batch=1
  3. If success, plugin invokes the compiler to compile the model with batch=1
  4. Plugin dumps the blob in the FS
    ===end of context===
  5. A user calls benchmark_app -m model.blob -data_shape [4, 3, 224, 224]
  6. Plugin loads the model onto the device
  7. Plugin sees the batch -> it creates N (4) infer requests

I am now experimenting with preserving info about batch dynamism via tensor name encoding.

The approach:
When a model with dynamic batch dimension needs to be compiled with a fixed batch size (batch=1), we encode the original dynamic nature directly into the tensor names before compilation. Specifically, we append a "DYNBATCH_ORIG" suffix (will think of a better one later) to both input and output tensor names.

How it works:
During compilation phase: before calling ov::set_batch(model, 1) on a target case, we modify all input and output tensor names by appending the marker suffix. The marker survives compilation: tensor names are preserved through the OpenVINO compilation pipeline and remain accessible in the compiled blob.

During reconstruction phase: when creating dummy models from IODescriptors (which contain the preserved tensor names), we check for the presence of the marker to determine if the original model had dynamic batching.

Automatic restoration: if the marker is found (only for target deBached cases), we restore the dynamic batch dimension (ov::Dimension(-1)) in the dummy model's shape.

Key advantages:

  • No external metadata: the information travels with the model itself through the standard OpenVINO pipeline
  • Survives serialization: tensor names are part of the core model representation and persist through export/import cycles
  • Minimal overhead: just a string suffix, no complex data structures

Potential improvement to consider:
🟢 Instead of replacing the original tensor name, we should probably add the marker as an additional name to the tensor's name set. This would preserve the original name in case it's being used elsewhere in the pipeline, while still carrying our dynamic batch information. The tensor would then have both the original name and the marked name, providing better compatibility and reducing the risk of breaking existing name-based lookups.

❓ We could reuse the same approach and save Layout information to avoid checking hardcoded BATCH_AXIS when validating the model in validateModelBatch. Probably it should be done in a separate PR.

🟢 I will open this PR for review once I complete full validation and fix any potential issues

Tickets:

  • E-176749

@DariaMityagina DariaMityagina self-assigned this Aug 19, 2025
@DariaMityagina DariaMityagina added the category: NPU OpenVINO NPU plugin label Aug 19, 2025
@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 30529c1 to 29bbcdc Compare August 19, 2025 07:01
@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch 3 times, most recently from 6f2a537 to b9cbb0f Compare September 9, 2025 23:00
@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 105c432 to 3a4baa2 Compare September 10, 2025 11:51
@DariaMityagina DariaMityagina added this to the 2025.4 milestone Sep 22, 2025
Copy link
Contributor

@sivanov-work sivanov-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, we have too wide spread an unclear batch/dynamic-batch handling logic here, which actually were imposed by a legacy design.

The things I'd be very eager to discuss are:

  1. I think we should remove any opened presumption about N dimension location(even more by getting the batch from hardcoded BATCH_AXIS positions), because it will not work for 100% cases and we will need to return to this problem again and again making more and more refactoring.
    Instead I suggest we use layout information (or ask user to supply it if that doesn't exist) to get batch dimension, as this plugin.cpp part was designed to be generic rather than PLUGIN batch emphasized

  2. even though if we get off BATCH_AXIS and retain these "support-batch-case" various check: I see that these limitations are spread across these TWO places: the one part in "plugin.cpp" side where we check PLUGIN mode, and the actual limitation which are contained in ZeroInferRequest where we are checking data structures once again.
    IMHO, plugin.cpp should be clean from any assumptions whatever PLUGIN implementation can or cannot support and doesn't introduce such fine-grained conditions testing. Otherwise we would be destined always fix/add/invent symmetrical changes in both ZeroInferRequest and plugin.cpp as well.

  3. If we had agreed to consider each N=1 batched infer request, as a possible batched/dynamic-batched infer request, then it's also possible to release burden from this plugin.cpp part and always use set_batch(1) for any scenario where PLUGIN mode is involved so that later we could reallocate inner data when set_tensor() comes with a new N.
    So that it allows us to implement in ZeroInferRequest N=1 as a generic case and only confine set_tensor()/infer() implementation for event when new tensors are set by user which have a different batch value

As I can see:

  1. we are forced to set_batch(1) when we are dealing with batched network as ZeroInferRequests for PLUGIN mode use that. It seems to me that we MUST set_batch(1) every time we deal with a batched network. We cannot invoke set_bacth(1) for each network as it is not reliable and may fail for non-batched network, therefore we must determine that network is a batched network. Despite this entire legacy logic is, IMHO, redundant and weak- I think that we should stop here and do not proceed further for separating static batch/dynamic batch cases and bringing up any artificial limitations more than we already have. So that I hope that simply using set_batch(1) is enough here

  2. ZeroInferRequest deals with N=1 situation (or doesn't even care what it has got) and only add some checking into set_tensor() that new tensor is N times more of less than existing one. If ALL tensors got properly changed in the same proportion N - that we deal with bathced case whatever static/dynamic it is

  3. IMHO ideal situation, would be is to get off all heuristic/limitation from plugin.cpp and encapsulate them all in ZeroInferRequest

const bool batchedModel = ov::get_batch(modelForCompilation) != intel_npu::utils::DEFAULT_BATCH_SIZE;

if (autoOrPluginBatch && pluginBatchingIsSupported && batchedModel) {
_logger.info("Attempting to handle batching on the plugin side.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tips:
I recommend to extend the logger info by adding existing a current value of batch
.. from {ov::get_batch(modelForCompilation} to "1"

ov::set_batch(modelForCompilation, 1);
// TODO: add debatcher for more complicated cases as set_batch is pretty naive.
} else {
_logger.info("Unable to manage batching on the plugin side, so the compiler will take care of it.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tips:
a value of the batch in info printing will also help in troubleshooting


updateBatchMode(ov::intel_npu::BatchMode::COMPILER);
} catch (const std::exception& ex) {
_logger.info("Couldn't validate and reshape the model. Batching will be handed by compiler.", ex.what());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AUTO & PLUGIN makes no difference then, as far I remember from the compiler code we roll back to the COMPILER mode only if we have got AUTO enabled so that we do not overwrite explicitly requested PLUGIN mode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen when PLUGIN mode cannot be applied? Should we fail immediately in this scenario rather than falling back to COMPILER mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed here: #31784 (comment)

ov::Layout layout = ov::layout::get_layout(input);

// Batching on plugin is working only when batching is found on 0th dimension
if ((shape.size() && shape[intel_npu::utils::BATCH_AXIS].get_max_length() != intel_npu::utils::DEFAULT_BATCH_SIZE) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don;t think that it's good idea to stick to the static BATCH_AXIS while determining a batch dimension, there are many cases when the 0th dimension is not a batch at all, as well as the cases where batch can be determined not at the 0th position

Since we have "layout" attribute affiliated to a model - we could adhere to it.
On the vpux-compiler side we have this logic in determining batches for the debatch method:

  1. check whether or not user specified "layouts" explicitly
  2. check ov::FindBatch outcome
  3. trying to set_batch(1) ...

what do you think if we reuse similar logic? Or do not introduce any logic at the plugin.cpp level at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it's good idea to stick to the static BATCH_AXIS while determining a batch dimension, there are many cases when the 0th dimension is not a batch at all, as well as the cases where batch can be determined not at the 0th position

Discussed with the team offline. Agreed to improve this in separate PRs.

@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from e932c8d to ef91465 Compare September 22, 2025 22:04
Copy link
Contributor

@pereanub pereanub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think that considering all the models that have batch size set to 1 as dynamic models is the correct approach here.

@DariaMityagina
Copy link
Contributor Author

DariaMityagina commented Sep 23, 2025

I still don't think that considering all the models that have batch size set to 1 as dynamic models is the correct approach here.

This approach was inspired by the requirements for this task:

1. Plugin detection: The plugin identifies a network with a dynamic batch size.
2. Batch conversion: It transforms the network into one with a batch size of 1, if possible.
3. Driver compilation: Only after this transformation does the network get passed to the driver for compilation.
4. Isolation from Compiler and Driver: Crucially, neither the compiler nor the driver is aware that the network originally had a dynamic/static batch size.

- To make dynamic batch compatible with older drivers
- Needed compatibility with old NPU Plugins
- [Stretch goal] No need for Metadata in compiled blobs, NPU Plugin to handle batches != 1 automatically using concurrently created infer requests

To minimize the risks, I will try to limit the application of current changes to the dynamic batch for now, after which we can work on a common solution.

@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch 6 times, most recently from 696966e to 3f6ecd7 Compare September 30, 2025 08:44
@pereanub
Copy link
Contributor

pereanub commented Oct 1, 2025

Changes look good to me but @PatrikStepan @razvanapetroaie please have a look.

@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 3f6ecd7 to 547a304 Compare October 1, 2025 16:15
const std::optional<size_t> batchSize) {
// Check if tensor was originally dynamic by looking for encoded markers
// This information is needed to restore the original dynamic batching behavior
auto wasOriginallyDynamic = [](const std::unordered_set<std::string>& tensorNames) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also used in plugin.cpp. Maybe we should move this to a separate function to avoid duplication? TBD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, 9957766

}

return std::nullopt;
return tensor->get_shape()[intel_npu::utils::BATCH_AXIS];
Copy link
Contributor Author

@DariaMityagina DariaMityagina Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasDynamic == true is equivalent to origShape[intel_npu::utils::BATCH_AXIS].is_dynamic()

if (localConfig.get<BATCH_MODE>() == ov::intel_npu::BatchMode::PLUGIN) {
OPENVINO_THROW("This model contains states, thus it is not supported when handling batching on the plugin");
// Handle batch mode configuration
if (localConfig.isAvailable(ov::intel_npu::batch_mode.name())) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Small refactoring - putting all if (localConfig.isAvailable(ov::intel_npu::batch_mode.name()) .. under one condition
  • Calling handleDynamicBatching


// Batching on plugin is working only when batching is found on 0th dimension
if ((shape.size() &&
shape[intel_npu::utils::BATCH_AXIS].get_max_length() != intel_npu::utils::DEFAULT_BATCH_SIZE) ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could reuse the same approach and save Layout information to avoid checking hardcoded BATCH_AXIS when validating the model in validateModelBatch.

Agreed that if we were to do it we would so it in a separate PR.

@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 9957766 to c8ed675 Compare October 1, 2025 20:11
@DariaMityagina DariaMityagina marked this pull request as ready for review October 2, 2025 11:07
@DariaMityagina DariaMityagina requested review from a team as code owners October 2, 2025 11:07
@DariaMityagina
Copy link
Contributor Author

@PatrikStepan, @razvanapetroaie, @pereanub, @sivanov-work could you please review this PR?

// Encode info in input tensor names
for (auto& input : model->inputs()) {
const std::string originalName = input.get_any_name();
input.get_tensor().set_names({originalName, originalName + suffix});
Copy link
Contributor

@sivanov-work sivanov-work Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just serialize PartialShape into a string by using PartialShape::to_string()? It can be restored easily by calling PartialShape(string) ctor.
PartialShape give us more information regarding tensors, so that instead of getting "a flag", which has not 100% probability for appearing naturally as a regular input name, we will have a full shape reconstruction so that we won't not need for shapeFromCompiler and other peculiar stuff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already dedicated metadata fields (shapeFromIR, shapeFromCompiler) for PartialShape storage.

I'd prefer not to duplicate this data in tensor names, especially since the only missing information in PLUGIN mode is the batch dimension.

Our current problem is quite specific - we need to preserve only the knowledge that a batch dimension was originally dynamic, not reconstruct the entire shape from scratch. The existing metadata already provides complete shape information; we just need a lightweight flag to indicate the dynamic batch nature.

Copy link
Contributor

@sivanov-work sivanov-work Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will not duplicate that data as shapeFromIR, shapeFromCompiler lose that information once set_batch(1) is being called on initially batched model, which makes the compiler, who gathers that metadata, ignorant regarding any dynamism in N in an original model, which makes OV/Plugin the only informational expert here

The point is that we have a chance to tame a holistic approach rather than relying on different "flags" spread across all codebase

Our current problem is quite specific - we need to preserve only the knowledge that a batch dimension was originally dynamic

Agree, and we can do it by storing an original shape.

The existing metadata already provides complete shape information

Apparently not, as we lose it in shapeFromIR, shapeFromCompiler after set_batch(1)

we just need a lightweight flag to indicate the dynamic batch nature.

Gently disagree, what we actually need is: we need to preserve only the knowledge that a batch dimension was originally dynamic . The light flag is one of the implementations among other variants. We can get that info from the original shape easily, simultaneously providing more flexibility and we may get out of shapeFromCompiler eventually

Ultimately It's up to you, but IMHO I see that preserving shapes and layouts in names is a more holistic and generic approach rather than existing flags+metadata combination, which allows us to do not make refactoring once again if we decide to go with the idea of "pipeline optimization based on bounds info" later

const std::string suffix = intel_npu::utils::DYNBATCH_SUFFIX;

// Encode info in input tensor names
for (auto& input : model->inputs()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually before doing that, I think it would be good idea to ensure that a model is dynamic. At least assert should be put here, that makes this function suitable for unit testing. Right now - it's content-dependent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encodeDynamicBatchInfo is called in handleDynamicBatching, which has this check in the very beginning:

void Plugin::handleDynamicBatching(std::shared_ptr<ov::Model>& modelForCompilation,
                                   Config& localConfig,
                                   const std::function<void(ov::intel_npu::BatchMode)>& updateBatchMode) const {
    // Avoiding risks with static models. TODO: common solution.
    if (!modelForCompilation->is_dynamic()) {

        return;
    }
    ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this encodeDynamicBatchInfo depends on a precondition checked in handleDynamicBatching, so that it highly coupled to it implicitly. During future refactoring someone may want to reuse this function, and that precondition might got broken and entire behavior may get unstable.

I suggest either decouple this function or add a sanity check to make sure that it does not transform a static model to a model with dynamic prefixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

const std::string originalName = output.get_any_name();
output.get_tensor().set_names({originalName, originalName + suffix});
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just put PartialShape into tensor names set unconditionally, without adding index DYNBATCH_SUFFIX and without checking that the model is a dynamic one - we could just give a name to this function like EncodeModelTensorMetadata (or something else) which IMHO will make this function more abstract and holistic

Config& localConfig,
const std::function<void(ov::intel_npu::BatchMode)>& updateBatchMode) const {
// Avoiding risks with static models. TODO: common solution.
if (!modelForCompilation->is_dynamic()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we encode PartialShape into tensors names, the function encodeDynamicBatchInfo/EncodeModelTensorMetadata will become context-independent and it seems we will be able to resolve this TODO

ex.what());

deBatchModel(modelForCompilation, ov::Dimension(1));
if (!modelForCompilation) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how we can get nullptr after deBatchModel() ? I guess that it also may throw an exception.
probably it would be better to catch the exception inside deBatchModel, which should return boolean or the same shared_ptr on a debatched model

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!


deBatchModel(modelForCompilation, ov::Dimension(1));
if (!modelForCompilation) {
OPENVINO_THROW("Cannot debatch a model");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, I think we should procced with the COMPILER path provided that AUTO mode has been specified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will catch this THROW here: line 763

_logger.info("Couldn't validate and reshape the model. Batching will be handled by compiler. Error: %s",
                     ex.what());
updateBatchMode(ov::intel_npu::BatchMode::COMPILER);

And set batch mode to COMPILER.

Trying to debatch it...
[INFO] 12:19:33.452 [NPUPlugin] Couldn't validate and reshape the model. Batching will be handled by compiler. Error: Exception from src\plugins\intel_npu\src\plugin\src\plugin.cpp:756:
Cannot debatch a model

[INFO] 12:19:33.452 [NPUPlugin] Setting batching mode to COMPILER.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that fallback to COMPILER mode here should only occur when AUTO mode was initially selected?
And when PLUGIN mode is explicitly set, it should fail rather than attempt fall back?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as default mode is AUTO, then a user usually doesn't care how a model will be processed. it depends on the implementation and according to our implementation (legacy) we try PLUGIN mode at first and switch to the COMPILER only if it fails.

If the user specifies non-default value as BATCH_MODE, then it is their clear intention and reputedly we should not ignore their choice. Otherwise they waste a lot of time trying to figure out why the results aren't what they expected

And when PLUGIN mode is explicitly set, it should fail rather than attempt fall back?

and vice versa for COMPILER

Copy link
Contributor Author

@DariaMityagina DariaMityagina Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks a lot for clarifications!
I will apply this comment to the logic to ensure batch mode modifications occur only when AUTO mode is specified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

if (!model->get_variables().empty()) {
if (localConfig.get<BATCH_MODE>() == ov::intel_npu::BatchMode::PLUGIN) {
OPENVINO_THROW(
"This model contains states, thus it is not supported when handling batching on the plugin");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have the combination STATES=true, BATCH_MODE=PLUGIN, no batch in a model?

perhaps this check must be placed after validateModelBatch()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OV documentation batching-on-npu-plugin

OpenVINO™ NPU plugin supports batching either by executing concurrent inferences or by relying on native compiler support for batching.
First, the NPU plugin checks if the following conditions are met:

  • The batch size is on the first axis.
  • All inputs and outputs have the same batch size.
  • The model does not contain states.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps this check must be placed after validateModelBatch()

Sorry, how so? In case the model contains states, we won't be able to apply PLUGIN batch, therefore we won't even go to validateModelBatch().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the algorithm correctly, it will throws the exception when the following conditions are met as well:
a) BATCH_MODE= PLUGIN
b) a model has STATES
c) a model is not a batched model (it may be even static and not batched)

That is the case I concerned about: why would we fail on non-batched model which has states?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean now, will look into it.
Though my understanding is that PLUGIN mode isn't typically used with non-batched models.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an implicit default mode, I suppose it's natural to expect that model with N=1 will be pass through both BATCH_MODE PLUGIN and COMPILER intact, as we have only two deterministic modes as AUTO: PLUGIN and BATCH and we don't have NO_BATCH.

Otherwise it's strange if a non-batched model would be compiled using AUTO and wouldn't be compiled using PLUGIN as technically PLUGIN is the first approach of AUTO (PLUGIN mode still use the same ZeroInferRequest and other primitives even if N=1)

// This information is needed to restore the original dynamic batching behavior
static inline bool wasOriginallyDynamic(const std::unordered_set<std::string>& tensorNames) {
for (const auto& name : tensorNames) {
if (name.find(DYNBATCH_SUFFIX) != std::string::npos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we add prefix to a second name only, probably we should skip an original name iterator

@DariaMityagina
Copy link
Contributor Author

@PatrikStepan, @razvanapetroaie, @pereanub could you please review this PR?

Copy link
Contributor

@sivanov-work sivanov-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although there are some points that would be interesting to explore in more detail as future possibilities:

  1. we may want to get off validateModelBatch since it only emphasizes a constraint and prints a pretty error regarding plugin mode "cannot ... and model must be ...", while the actual constraint is in ZeroinferRequest and only this component should decide whether it will be able to infer that model or not. This decision may depends on a result of whether the pair of function set_batch(1) || debatchModel() succeeds or fails, thus there might be chances that this validateModel might be completely substituted by this pair called unconditionally regardless of whether an actual model is batched/dynamic

  2. I'd suggest serialize a PartisalShape and Layout info as an additional tensor names in this PR (instead of using a flag) to do not return to this code in the Plugin.cpp once again, so that all future additional logic will be contained in ZeroInferRequest.cpp. That particularly will allow us to get off all shapeFrom*** constructs and give more flexibility in making decision whether a model can be inferred (or better inferred) on ZeroInferRequest side.
    Given those changes, the only encodeDynamicBatchInfo responsibility will be to store PartialShapes & Layout in tensor names, which can be called for any model and be renamed to encodeTensorInfo

Having those applied, we may significantly reduce the amount of batch/dynamic processing code in this Plugin.cpp as for now it has a lot of batch-related responsibilities


// Sanity check: ensure we don't transform static models
if (!model->is_dynamic()) {
_logger.warning("Attempting to encode dynamic batch info on a static model. Skipping encoding.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding the check!

though I'd lower the logger severity from warning to debug, at least, or remove it completely - because IMHO "warning" level imply some actions from the user to be made but nothing they can do here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from d4dc299 to e6c636b Compare October 8, 2025 17:24
const std::string suffix = intel_npu::utils::DYNBATCH_SUFFIX;

// Encode info in input tensor names
for (auto& input : model->inputs()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

ex.what());

deBatchModel(modelForCompilation, ov::Dimension(1));
if (!modelForCompilation) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

Comment on lines 83 to 84
auto wasDynamic = intel_npu::utils::wasOriginallyDynamic(desc.outputTensorNames);
if (!wasDynamic) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compatibility:

Suggested change
auto wasDynamic = intel_npu::utils::wasOriginallyDynamic(desc.outputTensorNames);
if (!wasDynamic) {
auto wasDynamic = intel_npu::utils::wasOriginallyDynamic(desc.outputTensorNames);
auto dynamicBatchFromIR = desc.shapeFromIRModel.has_value() && (*desc.shapeFromIRModel)[intel_npu::utils::BATCH_AXIS].is_dynamic();
if (!wasDynamic && !dynamicBatchFromIR ) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// Sanity check: ensure we don't transform static models
if (!model->is_dynamic()) {
_logger.warning("Attempting to encode dynamic batch info on a static model. Skipping encoding.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from e6c636b to 1054763 Compare October 8, 2025 19:21
@DariaMityagina
Copy link
Contributor Author

@PatrikStepan, @razvanapetroaie, @pereanub could you please review this PR?

Comment on lines 369 to 373
if (ov::layout::has_batch(layout)) {
partShape[ov::layout::batch_idx(layout)] = newBatch;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (ov::layout::has_batch(layout)) {
partShape[ov::layout::batch_idx(layout)] = newBatch;
}
if (ov::layout::has_batch(layout)) {
partShape[ov::layout::batch_idx(layout)] = newBatch;
} else {
partShape[intel_npu::utils::BATCH_AXIS] = newBatch;
}

@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 1054763 to 8a0031d Compare October 9, 2025 11:09
…and Compiler - no metadata changes - fix static tests
…and Compiler - fix BA issues - treat every model with batch 1 as a potentially dynamically batched one
…and Compiler - validateModelBatch conditions
…and Compiler - will be prettier, functionality first
@DariaMityagina DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 8a0031d to 85950b2 Compare October 10, 2025 01:25
…and Compiler - compatibility with older blobs
@PatrikStepan
Copy link
Contributor

PatrikStepan commented Oct 10, 2025

By adding markers to original input names we are actually storing plugin owned information into the blob generated by the compiler, but without adding a clear API for this or an entry in the blob metadata. This is different than all the other name conventions we already have for states, shapes and weights-as-input. In all the other cases it is the compiler that transforms the model and and adds extra inputs/outputs to the model. Driver guarantees backward/forward compatibility for all these.

Embedding the (dynamic) batch information in input names becomes problematic when we attempt to maintain blob compatibility among multiple OpenVINO versions. In case we ever decide to change the convention and reject the blobs we generate today, plugin will have to first send the blob to the driver for parsing then try to match inputs with predefined strings. Only to later reject the blob if they have the old prefix. Storing this information in a container that plugin can interpret without parsing the entire blob is preferable.

The prefix we add to the input name is nothing else than an unscalable container for the batch information we need to store. We already know it is enough only for dynamic batch but not for a static batch. A separate versioned container would allow us to extend this information in the future.

My proposal is to proceed with the alternative solution where we store the information we need in the metadata that is owned by the plugin and appended to the compiler blob. It is already versioned and it will avoid another naming convention we will have to maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants