Skip to content

Conversation

@razvanapetroaie
Copy link
Contributor

@razvanapetroaie razvanapetroaie commented Sep 2, 2025

Details:

  • Extends the OV StreamSerializer & XmlSerializer in order to allow passing an ov::Model through the driver without copying its weights into a separate buffer.
  • The purpose of this PR is to reduce memory consumption by avoiding weights duplications.
  • This feature will be disabled by default for the CiD interface (at least for a while). The changes are first meant to be integrated in the upcoming CiP interface.
  • The implementation followed and adapted the sample provided in this PR.
  • Two config options introduced: intel_npu::use_base_model_serializer - a switch between the old & new serialization algorithms, and intel_npu::serialization_weights_size_threshold - controls which weights are copied into a separate buffer and which ones have only metadata (memory location & size) stored as runtime information. More concretely, weights smaller than this value will be copied.
  • vcl_serializer.hpp is meant to contain all operations required to prepare an ov::Model as food for the VCL interface. This implies using the new/old model serializer, I/O & config serialization. xml_serializer.hpp is a more generic, weightless (no weights copies unless serialization_weights_size_threshold is used) implementation of the OV serializer.
  • Roughly how this works:
    • The plugin passes through all ov::Constant nodes and places weights metadata (intel_npu::WeightsPointerAttribute) as runtime information on the nodes that have buffers smaller than serialization_weights_size_threshold.
    • The new intel_npu::StreamSerialize is called which uses the intel_npu::XmlSerializer for serializing the model. Note that StreamSerialize uses a slightly different format within the buffer (metadata containing offsets & sizes, custom data, weights & the XML graph), see ov::pass::StreamSerialize for details.
    • intel_npu::XmlSerializer will not write weights into its dedicated buffer if the WeightsPointerAttribute is found within the current ov::Constant node. Instead, weights metadata will be written as runtime information by calling the visit method corresponding to the attribute.
    • The deserializer will be able to distinguish between the two cases (weights copied vs. weights stored as metadata) by looking for this attribute in the serialized buffer.
  • See the ticket for some performance reports.
  • TODO: some functional tests

Related PRs

Tickets:

@github-actions github-actions bot added category: Core OpenVINO Core (aka ngraph) category: build OpenVINO cmake script / infra category: transformations OpenVINO Runtime library - Transformations category: IR FE OpenVINO IR v10 / v11 FrontEnd category: CPP API OpenVINO CPP API bindings category: NPU OpenVINO NPU plugin labels Sep 2, 2025
@razvanapetroaie razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from 5cf161e to 2a06154 Compare September 17, 2025 13:39
@github-actions github-actions bot removed category: Core OpenVINO Core (aka ngraph) category: transformations OpenVINO Runtime library - Transformations category: IR FE OpenVINO IR v10 / v11 FrontEnd category: CPP API OpenVINO CPP API bindings labels Sep 17, 2025
@razvanapetroaie razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from cd6598a to a33952f Compare September 18, 2025 16:49
@razvanapetroaie razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from 3342d2c to ea58a81 Compare September 22, 2025 13:54
@razvanapetroaie razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from faca59f to 7800811 Compare September 22, 2025 17:25
@razvanapetroaie razvanapetroaie changed the title [NPU] Cvs 169982 weights serialization [NPU] Model serialization/deserialization without weights copies Sep 24, 2025
@github-actions github-actions bot added the category: Core OpenVINO Core (aka ngraph) label Sep 25, 2025
@razvanapetroaie razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from 87a0898 to f93ef2b Compare September 25, 2025 12:58
@razvanapetroaie razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from ad5df75 to c37d8ae Compare October 1, 2025 14:22
@razvanapetroaie razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from c37d8ae to a2362fe Compare October 2, 2025 09:44
}

static size_t defaultValue() {
return 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will tune this later to see which value yields the best performance. For now, we assume 0 is the best candidate (only weights pointers & sizes are stored).

@razvanapetroaie razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from eed9db5 to 5bba959 Compare October 15, 2025 16:11
@moslex moslex added this to the 2025.4 milestone Oct 21, 2025
@PatrikStepan
Copy link
Contributor

Decided to merge this on the master branch after the branch for OV25.4 is created. This PR alone bring no benefits to the release without further changes in the driver or without further plugin changes planned for OV26.0.

@PatrikStepan PatrikStepan removed this from the 2025.4 milestone Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: build OpenVINO cmake script / infra category: NPU OpenVINO NPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants