-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler - new metadata version #32350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler - new metadata version #32350
Conversation
src/plugins/intel_npu/src/compiler_adapter/src/ze_graph_ext_wrappers.cpp
Outdated
Show resolved
Hide resolved
Metadata<CURRENT_METADATA_VERSION>(blobSizesBeforeVersioning, | ||
CURRENT_OPENVINO_VERSION, | ||
initBlobSizes, | ||
originalBatchSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pass here information from the CompiledModel itself, not from the _graph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
93181ac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
originalBatchSize) | |
_batchSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments
src/plugins/intel_npu/src/utils/include/intel_npu/utils/utils.hpp
Outdated
Show resolved
Hide resolved
3d11a35
to
93181ac
Compare
@PatrikStepan hi! Could you please take another look at this PR? |
61e0dd0
to
b5a2daa
Compare
// If we have successfully debatched the model on the PLUGIN side, we should | ||
// avoid repeating the same in the compiler by resetting the batch mode | ||
updateBatchMode(ov::intel_npu::BatchMode::COMPILER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DariaMityagina @pereanub what if we remove BATCH_MODE from the config here? From the compiler source code, it seems to have the same effect.
BATCH_MODE is still a private property, it is added today only by the plugin itself in the list of compiler configs. With changes from this PR maybe it should be added only when the fallback on the compiler implementation is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With changes from this PR maybe it should be added only when the fallback on the compiler implementation is required?
That's exactly what's happening here, let's take a closer look.
If BATCH_MODE = AUTO
or PLUGIN
:
- Verify if the model is compatible with batching on the plugin side
- Attempt to reshape the model using
set_batch(1)
- If this is successful, we continue with the
PLUGIN
BATCH_MODE- Write/Read batch size into NPU plugin metadata
- The reshaped model is sent to the compiler. BATCH_MODE must be set to
COMPILER
- Adjust the metadata post-compilation to retain the original batch value
- If it isn't successful:
- If BATCH_MODE =
AUTO
: we fallback to BATCH_MODE =COMPILER
- If BATCH_MODE =
PLUGIN
: we fail
- If BATCH_MODE =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I was challenging this part: "BATCH_MODE must be set to COMPILER". If BATCH_MODE is not found in the config, the compiler will attempt to apply its own batching mechanism anyway.
Can BATCH_MODE remain a hint only for the plugin?
- If set to PLUGIN: try to debatch the model and pass a modified model to the compiler without any extra configs if debatching succeeds, throw otherwise
- if set to AUTO or if not set at all: try to debatch the model and pass a modified model to the compiler without any extra configs if debatching succeeds, fallback on the compiler implementation otherwise
- If set to COMPILER or fallback to compiler implementation: pass the original model to the compiler without any extra configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I removed the BATCH_MODE
must be set to COMPILER
requirement after successful debatching (here fb6363b).
However, I think it still makes sense to pass the config value for the remaining cases to avoid duplicating the plugin-side validation logic in the compiler. This way we can directly use the updated value from the plugin instead of re-implementing/repeating the same checks just to set BATCH_MODE
.
At least in the current state. We can adjust this implementation later through a dedicated task with focused improvements.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPU_BATCH_MODE
is now a Runtime
option: 977e1b1
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not applicable anymore.
|
||
// Handle batch mode configuration | ||
std::optional<ov::Dimension> originalBatch = std::nullopt; | ||
if (localConfig.isAvailable(ov::intel_npu::batch_mode.name())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BATCH_MODE is not available when compiler does not support this option. But batching could still be handled on the plugin side (in some cases at least). Isn't that the actual requirement that needs to be addressed in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
977e1b1
to
081c268
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is not very clear to me how you fix today fix batch size case.
E.g.: batch size of I/O of the network is not dynamic but fix to 3x... How is this supposed to work today?
@pereanub, static batch size is fixed here:
Instead of: So, when batch=3 (static) is used, e.g.:
|
src/plugins/intel_npu/src/compiler_adapter/src/plugin_compiler_adapter.cpp
Outdated
Show resolved
Hide resolved
f33bf5e
to
095954f
Compare
…and Compiler - review, metadata used
…and Compiler - clang
…and Compiler - review
…and Compiler - remove some unused logic
…and Compiler - test fixes
…and Compiler - review (remove setting batch_mode to COMPILER after debatching, private methods, tests)
…and Compiler - review (helpers to a separate file, remove get_batch_size for compiledModel, BATCH_MODE is a runtime option)
…and Compiler - review (batchSize type)
…and Compiler - review (BATCH_MODE is needed, tests are not skipped)
…and Compiler - review (if compiler doesn't support BatchMode)
… Plugin and Compiler - review (batchSize type)" This reverts commit 095954f.
…and Compiler - review (if compiler doesn't support BatchMode) - fixes
…and Compiler - review (use partial shape instead of _compiledModel, coverity)
…and Compiler - review (redundant clone)
…and Compiler - review (protect original model, minor comments)
…and Compiler - review
f852a99
to
a16d4d8
Compare
@PatrikStepan @pereanub, @alexandruenache1111 hi! Could you please take another look at this PR? All comments have been addressed. |
c3498e4
to
6d228f7
Compare
…and Compiler - review (metadata write/read)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on the metadata side.
@razvanapetroaie hi! Could you please take another look at this PR? We would like to merge this today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing important imo. I can handle the nits in one of my PRs, so you won't have to rerun the CI jobs just for this.
/** | ||
* @details The number of init schedules, along with the size of each init binary object are read in addition to the | ||
* information provided by the previous metadata versions. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is wrong, this version is adding the batch size value, nothing related to init schedules. Same about the write function. You may leave this without comments if you wish, I think the description of the class makes this obvious enough.
Metadata<CURRENT_METADATA_VERSION>(blobSizesBeforeVersioning, | ||
CURRENT_OPENVINO_VERSION, | ||
initBlobSizes, | ||
originalBatchSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
originalBatchSize) | |
_batchSize) |
const std::shared_ptr<IGraph>& graph, | ||
const FilteredConfig& config); | ||
const FilteredConfig& config, | ||
const std::optional<int64_t>& batchSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a comment in the doxygen passage above on why we ended up passing this value separately. One would expect the batch size would be visible in the ov::Model
object.
: inputDescriptor.shapeFromCompiler; | ||
|
||
if (batchSize.has_value()) { | ||
shape[intel_npu::utils::BATCH_AXIS] = ov::Dimension(batchSize.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential problem that might occur here:
This shape is supposed to be the exact shape used by the original ov::Model
object (before compilation). This convention allows using I/O descriptors of the ov::Model
as identifiers for the CompiledModel
and its derived InferenceRequest
. The shape should also match in order to successfully identify I/Os.
If I interpret the code correctly, if the original model used a bounded interval as the batch axis value, you are overriding it using -1 (this line), so it doesn't match anymore. Now, maybe bounded values for the batch axis don't make much sense if adopting a ML perspective. And we are also having the same mismatching issue nowadays, since shapeFromIRModel
is not storing dynamic stuff. So maybe this issue is not really relevent.
Details:
The task is to explore the possibility of moving the reshaping process to the plugin. This would allow the compiler to receive either a network with batch size 1 if reshaping is successful, or a network with a non-1 batch size if reshaping fails. This approach aims to simplify the batch handling process and reduce dependencies on the compiler.
Flow:
batch=1
batch=1
===end of context===
-m model.blob
-data_shape [4, 3, 224, 224]
Tickets: