-
Couldn't load subscription status.
- Fork 2.8k
[NPU] Model serialization/deserialization without weights copies #31939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[NPU] Model serialization/deserialization without weights copies #31939
Conversation
5cf161e to
2a06154
Compare
cd6598a to
a33952f
Compare
3342d2c to
ea58a81
Compare
faca59f to
7800811
Compare
87a0898 to
f93ef2b
Compare
ad5df75 to
c37d8ae
Compare
c37d8ae to
a2362fe
Compare
| } | ||
|
|
||
| static size_t defaultValue() { | ||
| return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will tune this later to see which value yields the best performance. For now, we assume 0 is the best candidate (only weights pointers & sizes are stored).
src/plugins/intel_npu/src/compiler_adapter/src/driver_compiler_adapter.cpp
Show resolved
Hide resolved
src/plugins/intel_npu/tests/functional/behavior/npu_driver_compiler_adapter/custom_stream.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_npu/src/al/include/intel_npu/weights_pointer_attribute.hpp
Outdated
Show resolved
Hide resolved
eed9db5 to
5bba959
Compare
|
Decided to merge this on the master branch after the branch for OV25.4 is created. This PR alone bring no benefits to the release without further changes in the driver or without further plugin changes planned for OV26.0. |
Details:
ov::Modelthrough the driver without copying its weights into a separate buffer.intel_npu::use_base_model_serializer- a switch between the old & new serialization algorithms, andintel_npu::serialization_weights_size_threshold- controls which weights are copied into a separate buffer and which ones have only metadata (memory location & size) stored as runtime information. More concretely, weights smaller than this value will be copied.vcl_serializer.hppis meant to contain all operations required to prepare anov::Modelas food for the VCL interface. This implies using the new/old model serializer, I/O & config serialization.xml_serializer.hppis a more generic, weightless (no weights copies unlessserialization_weights_size_thresholdis used) implementation of the OV serializer.ov::Constantnodes and places weights metadata (intel_npu::WeightsPointerAttribute) as runtime information on the nodes that have buffers smaller thanserialization_weights_size_threshold.intel_npu::StreamSerializeis called which uses theintel_npu::XmlSerializerfor serializing the model. Note thatStreamSerializeuses a slightly different format within the buffer (metadata containing offsets & sizes, custom data, weights & the XML graph), seeov::pass::StreamSerializefor details.intel_npu::XmlSerializerwill not write weights into its dedicated buffer if theWeightsPointerAttributeis found within the currentov::Constantnode. Instead, weights metadata will be written as runtime information by calling the visit method corresponding to the attribute.Related PRs
Tickets: