-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler - ver 2 #31784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler - ver 2 #31784
Conversation
30529c1
to
29bbcdc
Compare
6f2a537
to
b9cbb0f
Compare
105c432
to
3a4baa2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, we have too wide spread an unclear batch/dynamic-batch handling logic here, which actually were imposed by a legacy design.
The things I'd be very eager to discuss are:
-
I think we should remove any opened presumption about N dimension location(even more by getting the batch from hardcoded BATCH_AXIS positions), because it will not work for 100% cases and we will need to return to this problem again and again making more and more refactoring.
Instead I suggest we use layout information (or ask user to supply it if that doesn't exist) to get batch dimension, as thisplugin.cpp
part was designed to be generic rather than PLUGIN batch emphasized -
even though if we get off BATCH_AXIS and retain these "support-batch-case" various check: I see that these limitations are spread across these TWO places: the one part in "plugin.cpp" side where we check PLUGIN mode, and the actual limitation which are contained in ZeroInferRequest where we are checking data structures once again.
IMHO,plugin.cpp
should be clean from any assumptions whatever PLUGIN implementation can or cannot support and doesn't introduce such fine-grained conditions testing. Otherwise we would be destined always fix/add/invent symmetrical changes in both ZeroInferRequest and plugin.cpp as well. -
If we had agreed to consider each N=1 batched infer request, as a possible batched/dynamic-batched infer request, then it's also possible to release burden from this
plugin.cpp
part and always useset_batch(1)
for any scenario where PLUGIN mode is involved so that later we could reallocate inner data when set_tensor() comes with a new N.
So that it allows us to implement in ZeroInferRequest N=1 as a generic case and only confine set_tensor()/infer() implementation for event when new tensors are set by user which have a different batch value
As I can see:
-
we are forced to
set_batch(1)
when we are dealing with batched network as ZeroInferRequests for PLUGIN mode use that. It seems to me that we MUSTset_batch(1)
every time we deal with a batched network. We cannot invokeset_bacth(1)
for each network as it is not reliable and may fail for non-batched network, therefore we must determine that network is a batched network. Despite this entire legacy logic is, IMHO, redundant and weak- I think that we should stop here and do not proceed further for separating static batch/dynamic batch cases and bringing up any artificial limitations more than we already have. So that I hope that simply usingset_batch(1)
is enough here -
ZeroInferRequest deals with N=1 situation (or doesn't even care what it has got) and only add some checking into
set_tensor()
that new tensor is N times more of less than existing one. If ALL tensors got properly changed in the same proportion N - that we deal with bathced case whatever static/dynamic it is -
IMHO ideal situation, would be is to get off all heuristic/limitation from
plugin.cpp
and encapsulate them all in ZeroInferRequest
const bool batchedModel = ov::get_batch(modelForCompilation) != intel_npu::utils::DEFAULT_BATCH_SIZE; | ||
|
||
if (autoOrPluginBatch && pluginBatchingIsSupported && batchedModel) { | ||
_logger.info("Attempting to handle batching on the plugin side."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tips:
I recommend to extend the logger info by adding existing a current value of batch
.. from {ov::get_batch(modelForCompilation} to "1"
ov::set_batch(modelForCompilation, 1); | ||
// TODO: add debatcher for more complicated cases as set_batch is pretty naive. | ||
} else { | ||
_logger.info("Unable to manage batching on the plugin side, so the compiler will take care of it."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tips:
a value of the batch in info printing will also help in troubleshooting
|
||
updateBatchMode(ov::intel_npu::BatchMode::COMPILER); | ||
} catch (const std::exception& ex) { | ||
_logger.info("Couldn't validate and reshape the model. Batching will be handed by compiler.", ex.what()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AUTO & PLUGIN makes no difference then, as far I remember from the compiler code we roll back to the COMPILER mode only if we have got AUTO enabled so that we do not overwrite explicitly requested PLUGIN mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should happen when PLUGIN
mode cannot be applied? Should we fail immediately in this scenario rather than falling back to COMPILER
mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed here: #31784 (comment)
ov::Layout layout = ov::layout::get_layout(input); | ||
|
||
// Batching on plugin is working only when batching is found on 0th dimension | ||
if ((shape.size() && shape[intel_npu::utils::BATCH_AXIS].get_max_length() != intel_npu::utils::DEFAULT_BATCH_SIZE) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don;t think that it's good idea to stick to the static BATCH_AXIS while determining a batch dimension, there are many cases when the 0th dimension is not a batch at all, as well as the cases where batch can be determined not at the 0th position
Since we have "layout" attribute affiliated to a model - we could adhere to it.
On the vpux-compiler side we have this logic in determining batches for the debatch method:
- check whether or not user specified "layouts" explicitly
- check ov::FindBatch outcome
- trying to set_batch(1) ...
what do you think if we reuse similar logic? Or do not introduce any logic at the plugin.cpp level at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that it's good idea to stick to the static BATCH_AXIS while determining a batch dimension, there are many cases when the 0th dimension is not a batch at all, as well as the cases where batch can be determined not at the 0th position
Discussed with the team offline. Agreed to improve this in separate PRs.
e932c8d
to
ef91465
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think that considering all the models that have batch size set to 1 as dynamic models is the correct approach here.
This approach was inspired by the requirements for this task:
To minimize the risks, I will try to limit the application of current changes to the dynamic batch for now, after which we can work on a common solution. |
696966e
to
3f6ecd7
Compare
Changes look good to me but @PatrikStepan @razvanapetroaie please have a look. |
3f6ecd7
to
547a304
Compare
const std::optional<size_t> batchSize) { | ||
// Check if tensor was originally dynamic by looking for encoded markers | ||
// This information is needed to restore the original dynamic batching behavior | ||
auto wasOriginallyDynamic = [](const std::unordered_set<std::string>& tensorNames) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also used in plugin.cpp
. Maybe we should move this to a separate function to avoid duplication? TBD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, 9957766
} | ||
|
||
return std::nullopt; | ||
return tensor->get_shape()[intel_npu::utils::BATCH_AXIS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasDynamic == true
is equivalent to origShape[intel_npu::utils::BATCH_AXIS].is_dynamic()
if (localConfig.get<BATCH_MODE>() == ov::intel_npu::BatchMode::PLUGIN) { | ||
OPENVINO_THROW("This model contains states, thus it is not supported when handling batching on the plugin"); | ||
// Handle batch mode configuration | ||
if (localConfig.isAvailable(ov::intel_npu::batch_mode.name())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Small refactoring - putting all
if (localConfig.isAvailable(ov::intel_npu::batch_mode.name()) ..
under one condition - Calling
handleDynamicBatching
|
||
// Batching on plugin is working only when batching is found on 0th dimension | ||
if ((shape.size() && | ||
shape[intel_npu::utils::BATCH_AXIS].get_max_length() != intel_npu::utils::DEFAULT_BATCH_SIZE) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could reuse the same approach and save Layout
information to avoid checking hardcoded BATCH_AXIS
when validating the model in validateModelBatch
.
Agreed that if we were to do it we would so it in a separate PR.
9957766
to
c8ed675
Compare
@PatrikStepan, @razvanapetroaie, @pereanub, @sivanov-work could you please review this PR? |
// Encode info in input tensor names | ||
for (auto& input : model->inputs()) { | ||
const std::string originalName = input.get_any_name(); | ||
input.get_tensor().set_names({originalName, originalName + suffix}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we just serialize PartialShape
into a string by using PartialShape::to_string()? It can be restored easily by calling PartialShape(string) ctor.
PartialShape give us more information regarding tensors, so that instead of getting "a flag", which has not 100% probability for appearing naturally as a regular input name, we will have a full shape reconstruction so that we won't not need for shapeFromCompiler
and other peculiar stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already dedicated metadata fields (shapeFromIR
, shapeFromCompiler
) for PartialShape storage.
I'd prefer not to duplicate this data in tensor names, especially since the only missing information in PLUGIN
mode is the batch dimension.
Our current problem is quite specific - we need to preserve only the knowledge that a batch dimension was originally dynamic, not reconstruct the entire shape from scratch. The existing metadata already provides complete shape information; we just need a lightweight flag to indicate the dynamic batch nature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will not duplicate that data as shapeFromIR, shapeFromCompiler
lose that information once set_batch(1)
is being called on initially batched model, which makes the compiler, who gathers that metadata, ignorant regarding any dynamism in N in an original model, which makes OV/Plugin the only informational expert here
The point is that we have a chance to tame a holistic approach rather than relying on different "flags" spread across all codebase
Our current problem is quite specific - we need to preserve only the knowledge that a batch dimension was originally dynamic
Agree, and we can do it by storing an original shape.
The existing metadata already provides complete shape information
Apparently not, as we lose it in shapeFromIR, shapeFromCompiler
after set_batch(1)
we just need a lightweight flag to indicate the dynamic batch nature.
Gently disagree, what we actually need is: we need to preserve only the knowledge that a batch dimension was originally dynamic
. The light flag
is one of the implementations among other variants. We can get that info from the original shape easily, simultaneously providing more flexibility and we may get out of shapeFromCompiler
eventually
Ultimately It's up to you, but IMHO I see that preserving shapes and layouts in names is a more holistic and generic approach rather than existing flags+metadata combination, which allows us to do not make refactoring once again if we decide to go with the idea of "pipeline optimization based on bounds info" later
const std::string suffix = intel_npu::utils::DYNBATCH_SUFFIX; | ||
|
||
// Encode info in input tensor names | ||
for (auto& input : model->inputs()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually before doing that, I think it would be good idea to ensure that a model is dynamic. At least assert
should be put here, that makes this function suitable for unit testing. Right now - it's content-dependent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encodeDynamicBatchInfo
is called in handleDynamicBatching
, which has this check in the very beginning:
void Plugin::handleDynamicBatching(std::shared_ptr<ov::Model>& modelForCompilation,
Config& localConfig,
const std::function<void(ov::intel_npu::BatchMode)>& updateBatchMode) const {
// Avoiding risks with static models. TODO: common solution.
if (!modelForCompilation->is_dynamic()) {
return;
}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this encodeDynamicBatchInfo
depends on a precondition checked in handleDynamicBatching
, so that it highly coupled to it implicitly. During future refactoring someone may want to reuse this function, and that precondition might got broken and entire behavior may get unstable.
I suggest either decouple this function or add a sanity check to make sure that it does not transform a static model to a model with dynamic prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
const std::string originalName = output.get_any_name(); | ||
output.get_tensor().set_names({originalName, originalName + suffix}); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just put PartialShape
into tensor names set unconditionally, without adding index DYNBATCH_SUFFIX and without checking that the model is a dynamic one - we could just give a name to this function like EncodeModelTensorMetadata
(or something else) which IMHO will make this function more abstract and holistic
Config& localConfig, | ||
const std::function<void(ov::intel_npu::BatchMode)>& updateBatchMode) const { | ||
// Avoiding risks with static models. TODO: common solution. | ||
if (!modelForCompilation->is_dynamic()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we encode PartialShape
into tensors names, the function encodeDynamicBatchInfo/EncodeModelTensorMetadata will become context-independent and it seems we will be able to resolve this TODO
ex.what()); | ||
|
||
deBatchModel(modelForCompilation, ov::Dimension(1)); | ||
if (!modelForCompilation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how we can get nullptr
after deBatchModel()
? I guess that it also may throw an exception.
probably it would be better to catch the exception inside deBatchModel
, which should return boolean or the same shared_ptr
on a debatched model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
|
||
deBatchModel(modelForCompilation, ov::Dimension(1)); | ||
if (!modelForCompilation) { | ||
OPENVINO_THROW("Cannot debatch a model"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong, I think we should procced with the COMPILER path provided that AUTO mode has been specified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will catch this THROW here: line 763
_logger.info("Couldn't validate and reshape the model. Batching will be handled by compiler. Error: %s",
ex.what());
updateBatchMode(ov::intel_npu::BatchMode::COMPILER);
And set batch mode to COMPILER
.
Trying to debatch it...
[INFO] 12:19:33.452 [NPUPlugin] Couldn't validate and reshape the model. Batching will be handled by compiler. Error: Exception from src\plugins\intel_npu\src\plugin\src\plugin.cpp:756:
Cannot debatch a model
[INFO] 12:19:33.452 [NPUPlugin] Setting batching mode to COMPILER.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that fallback to COMPILER
mode here should only occur when AUTO
mode was initially selected?
And when PLUGIN
mode is explicitly set, it should fail rather than attempt fall back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as default mode is AUTO, then a user usually doesn't care how a model will be processed. it depends on the implementation and according to our implementation (legacy) we try PLUGIN mode at first and switch to the COMPILER only if it fails.
If the user specifies non-default value as BATCH_MODE, then it is their clear intention and reputedly we should not ignore their choice. Otherwise they waste a lot of time trying to figure out why the results aren't what they expected
And when PLUGIN mode is explicitly set, it should fail rather than attempt fall back?
and vice versa for COMPILER
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks a lot for clarifications!
I will apply this comment to the logic to ensure batch mode modifications occur only when AUTO
mode is specified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
if (!model->get_variables().empty()) { | ||
if (localConfig.get<BATCH_MODE>() == ov::intel_npu::BatchMode::PLUGIN) { | ||
OPENVINO_THROW( | ||
"This model contains states, thus it is not supported when handling batching on the plugin"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we have the combination STATES=true, BATCH_MODE=PLUGIN, no batch in a model?
perhaps this check must be placed after validateModelBatch()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OV documentation batching-on-npu-plugin
OpenVINO™ NPU plugin supports batching either by executing concurrent inferences or by relying on native compiler support for batching.
First, the NPU plugin checks if the following conditions are met:
- The batch size is on the first axis.
- All inputs and outputs have the same batch size.
- The model does not contain states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps this check must be placed after validateModelBatch()
Sorry, how so? In case the model contains states, we won't be able to apply PLUGIN
batch, therefore we won't even go to validateModelBatch()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the algorithm correctly, it will throws the exception when the following conditions are met as well:
a) BATCH_MODE= PLUGIN
b) a model has STATES
c) a model is not a batched model (it may be even static and not batched)
That is the case I concerned about: why would we fail on non-batched model which has states?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean now, will look into it.
Though my understanding is that PLUGIN
mode isn't typically used with non-batched models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an implicit default mode, I suppose it's natural to expect that model with N=1 will be pass through both BATCH_MODE PLUGIN and COMPILER intact, as we have only two deterministic modes as AUTO: PLUGIN and BATCH and we don't have NO_BATCH.
Otherwise it's strange if a non-batched model would be compiled using AUTO and wouldn't be compiled using PLUGIN as technically PLUGIN is the first approach of AUTO (PLUGIN mode still use the same ZeroInferRequest and other primitives even if N=1)
// This information is needed to restore the original dynamic batching behavior | ||
static inline bool wasOriginallyDynamic(const std::unordered_set<std::string>& tensorNames) { | ||
for (const auto& name : tensorNames) { | ||
if (name.find(DYNBATCH_SUFFIX) != std::string::npos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we add prefix to a second name only, probably we should skip an original name iterator
@PatrikStepan, @razvanapetroaie, @pereanub could you please review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although there are some points that would be interesting to explore in more detail as future possibilities:
-
we may want to get off
validateModelBatch
since it only emphasizes a constraint and prints a pretty error regarding plugin mode "cannot ... and model must be ...", while the actual constraint is in ZeroinferRequest and only this component should decide whether it will be able to infer that model or not. This decision may depends on a result of whether the pair of functionset_batch(1) || debatchModel()
succeeds or fails, thus there might be chances that thisvalidateModel
might be completely substituted by this pair called unconditionally regardless of whether an actual model is batched/dynamic -
I'd suggest serialize a PartisalShape and Layout info as an additional tensor names in this PR (instead of using a flag) to do not return to this code in the Plugin.cpp once again, so that all future additional logic will be contained in ZeroInferRequest.cpp. That particularly will allow us to get off all
shapeFrom***
constructs and give more flexibility in making decision whether a model can be inferred (or better inferred) on ZeroInferRequest side.
Given those changes, the onlyencodeDynamicBatchInfo
responsibility will be to store PartialShapes & Layout in tensor names, which can be called for any model and be renamed toencodeTensorInfo
Having those applied, we may significantly reduce the amount of batch/dynamic processing code in this Plugin.cpp as for now it has a lot of batch-related responsibilities
|
||
// Sanity check: ensure we don't transform static models | ||
if (!model->is_dynamic()) { | ||
_logger.warning("Attempting to encode dynamic batch info on a static model. Skipping encoding."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding the check!
though I'd lower the logger severity from warning to debug, at least, or remove it completely - because IMHO "warning" level imply some actions from the user to be made but nothing they can do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
d4dc299
to
e6c636b
Compare
const std::string suffix = intel_npu::utils::DYNBATCH_SUFFIX; | ||
|
||
// Encode info in input tensor names | ||
for (auto& input : model->inputs()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
ex.what()); | ||
|
||
deBatchModel(modelForCompilation, ov::Dimension(1)); | ||
if (!modelForCompilation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
auto wasDynamic = intel_npu::utils::wasOriginallyDynamic(desc.outputTensorNames); | ||
if (!wasDynamic) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compatibility:
auto wasDynamic = intel_npu::utils::wasOriginallyDynamic(desc.outputTensorNames); | |
if (!wasDynamic) { | |
auto wasDynamic = intel_npu::utils::wasOriginallyDynamic(desc.outputTensorNames); | |
auto dynamicBatchFromIR = desc.shapeFromIRModel.has_value() && (*desc.shapeFromIRModel)[intel_npu::utils::BATCH_AXIS].is_dynamic(); | |
if (!wasDynamic && !dynamicBatchFromIR ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
// Sanity check: ensure we don't transform static models | ||
if (!model->is_dynamic()) { | ||
_logger.warning("Attempting to encode dynamic batch info on a static model. Skipping encoding."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
e6c636b
to
1054763
Compare
@PatrikStepan, @razvanapetroaie, @pereanub could you please review this PR? |
if (ov::layout::has_batch(layout)) { | ||
partShape[ov::layout::batch_idx(layout)] = newBatch; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (ov::layout::has_batch(layout)) { | |
partShape[ov::layout::batch_idx(layout)] = newBatch; | |
} | |
if (ov::layout::has_batch(layout)) { | |
partShape[ov::layout::batch_idx(layout)] = newBatch; | |
} else { | |
partShape[intel_npu::utils::BATCH_AXIS] = newBatch; | |
} |
1054763
to
8a0031d
Compare
…and Compiler - no metadata changes
…and Compiler - no metadata changes - fix static tests
…and Compiler - fix BA issues - treat every model with batch 1 as a potentially dynamically batched one
…and Compiler - validateModelBatch conditions
…and Compiler - dynamic dims limitation
…and Compiler - additional checks
…and Compiler - simplify
…and Compiler - will be prettier, functionality first
…and Compiler - clean up
…and Compiler - clean up
…and Compiler - fix tests
…and Compiler - review comments
…and Compiler - build warning fixes
8a0031d
to
85950b2
Compare
…and Compiler - compatibility with older blobs
By adding markers to original input names we are actually storing plugin owned information into the blob generated by the compiler, but without adding a clear API for this or an entry in the blob metadata. This is different than all the other name conventions we already have for states, shapes and weights-as-input. In all the other cases it is the compiler that transforms the model and and adds extra inputs/outputs to the model. Driver guarantees backward/forward compatibility for all these. Embedding the (dynamic) batch information in input names becomes problematic when we attempt to maintain blob compatibility among multiple OpenVINO versions. In case we ever decide to change the convention and reject the blobs we generate today, plugin will have to first send the blob to the driver for parsing then try to match inputs with predefined strings. Only to later reject the blob if they have the old prefix. Storing this information in a container that plugin can interpret without parsing the entire blob is preferable. The prefix we add to the input name is nothing else than an unscalable container for the batch information we need to store. We already know it is enough only for dynamic batch but not for a static batch. A separate versioned container would allow us to extend this information in the future. My proposal is to proceed with the alternative solution where we store the information we need in the metadata that is owned by the plugin and appended to the compiler blob. It is already versioned and it will avoid another naming convention we will have to maintain. |
Alternative solution: #32350
Details:
The task is to explore the possibility of moving the reshaping process to the plugin. This would allow the compiler to receive either a network with batch size 1 if reshaping is successful, or a network with a non-1 batch size if reshaping fails. This approach aims to simplify the batch handling process and reduce dependencies on the compiler.
Changes are applied to Dynamic batch cases only. Static batch scenario isn't modified.
Flow:
batch=1
batch=1
===end of context===
-m model.blob
-data_shape [4, 3, 224, 224]
I am now experimenting with preserving info about batch dynamism via tensor name encoding.
The approach:
When a model with dynamic batch dimension needs to be compiled with a fixed batch size (batch=1), we encode the original dynamic nature directly into the tensor names before compilation. Specifically, we append a "DYNBATCH_ORIG" suffix (will think of a better one later) to both input and output tensor names.
How it works:
During compilation phase: before calling ov::set_batch(model, 1) on a target case, we modify all input and output tensor names by appending the marker suffix. The marker survives compilation: tensor names are preserved through the OpenVINO compilation pipeline and remain accessible in the compiled blob.
During reconstruction phase: when creating dummy models from IODescriptors (which contain the preserved tensor names), we check for the presence of the marker to determine if the original model had dynamic batching.
Automatic restoration: if the marker is found (only for target deBached cases), we restore the dynamic batch dimension (ov::Dimension(-1)) in the dummy model's shape.
Key advantages:
Potential improvement to consider:
🟢 Instead of replacing the original tensor name, we should probably add the marker as an additional name to the tensor's name set. This would preserve the original name in case it's being used elsewhere in the pipeline, while still carrying our dynamic batch information. The tensor would then have both the original name and the marked name, providing better compatibility and reducing the risk of breaking existing name-based lookups.
❓ We could reuse the same approach and save
Layout
information to avoid checking hardcodedBATCH_AXIS
when validating the model invalidateModelBatch
. Probably it should be done in a separate PR.🟢 I will open this PR for review once I complete full validation and fix any potential issues
Tickets: