[NPU] Model serialization/deserialization without weights copies #31939

razvanapetroaie · 2025-09-02T15:29:04Z

Details:

Extends the OV StreamSerializer & XmlSerializer in order to allow passing an ov::Model through the driver without copying its weights into a separate buffer.
The purpose of this PR is to reduce memory consumption by avoiding weights duplications.
This feature will be disabled by default for the CiD interface (at least for a while). The changes are first meant to be integrated in the upcoming CiP interface.
The implementation followed and adapted the sample provided in this PR.
Two config options introduced: intel_npu::use_base_model_serializer - a switch between the old & new serialization algorithms, and intel_npu::serialization_weights_size_threshold - controls which weights are copied into a separate buffer and which ones have only metadata (memory location & size) stored as runtime information. More concretely, weights smaller than this value will be copied.
vcl_serializer.hpp is meant to contain all operations required to prepare an ov::Model as food for the VCL interface. This implies using the new/old model serializer, I/O & config serialization. xml_serializer.hpp is a more generic, weightless (no weights copies unless serialization_weights_size_threshold is used) implementation of the OV serializer.
Roughly how this works:
- The plugin passes through all ov::Constant nodes and places weights metadata (intel_npu::WeightsPointerAttribute) as runtime information on the nodes that have buffers smaller than serialization_weights_size_threshold.
- The new intel_npu::StreamSerialize is called which uses the intel_npu::XmlSerializer for serializing the model. Note that StreamSerialize uses a slightly different format within the buffer (metadata containing offsets & sizes, custom data, weights & the XML graph), see ov::pass::StreamSerialize for details.
- intel_npu::XmlSerializer will not write weights into its dedicated buffer if the WeightsPointerAttribute is found within the current ov::Constant node. Instead, weights metadata will be written as runtime information by calling the visit method corresponding to the attribute.
- The deserializer will be able to distinguish between the two cases (weights copied vs. weights stored as metadata) by looking for this attribute in the serialized buffer.
See the ticket for some performance reports.
TODO: some functional tests

Related PRs

Tickets:

CVS-173711

…ial yet

…work

…s-serialization

razvanapetroaie · 2025-10-09T09:42:49Z

src/plugins/intel_npu/src/al/include/intel_npu/config/options.hpp

+    }
+
+    static size_t defaultValue() {
+        return 0;


Will tune this later to see which value yields the best performance. For now, we assume 0 is the best candidate (only weights pointers & sizes are stored).

src/plugins/intel_npu/src/al/include/intel_npu/config/options.hpp

src/plugins/intel_npu/src/compiler_adapter/src/driver_compiler_adapter.cpp

src/plugins/intel_npu/src/compiler_adapter/src/vcl_serializer.cpp

src/plugins/intel_npu/tests/functional/behavior/npu_driver_compiler_adapter/custom_stream.cpp

src/plugins/intel_npu/src/compiler_adapter/include/vcl_serializer.hpp

src/plugins/intel_npu/src/compiler_adapter/src/vcl_serializer.cpp

src/plugins/intel_npu/src/al/include/intel_npu/weights_pointer_attribute.hpp

src/plugins/intel_npu/src/compiler_adapter/include/vcl_serializer.hpp

…s-serialization

PatrikStepan · 2025-10-21T11:04:03Z

Decided to merge this on the master branch after the branch for OV25.4 is created. This PR alone bring no benefits to the release without further changes in the driver or without further plugin changes planned for OV26.0.

copied

…s-serialization

razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from 5cf161e to 2a06154 Compare September 17, 2025 13:39

github-actions bot removed category: Core OpenVINO Core (aka ngraph) category: transformations OpenVINO Runtime library - Transformations category: IR FE OpenVINO IR v10 / v11 FrontEnd category: CPP API OpenVINO CPP API bindings labels Sep 17, 2025

razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from cd6598a to a33952f Compare September 18, 2025 16:49

razvanapetroaie added 4 commits September 22, 2025 13:54

Implementing the NPU plugin deserializer, it doesn't do anything spec…

d717f92

…ial yet

Addinge a serializer that does nothing special

55adf77

Adding a new config option

8f3e760

Starting to refactor the plugin-driver adapter

ea58a81

razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from 3342d2c to ea58a81 Compare September 22, 2025 13:54

Done refactoring

7800811

razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from faca59f to 7800811 Compare September 22, 2025 17:25

razvanapetroaie added 2 commits September 24, 2025 07:43

Tweaking the deserializer. First weights-copy solution that seems to …

7393f90

…work

Adding the same extensions used by the driver-compiler adapter

f2e1e64

razvanapetroaie changed the title ~~[NPU] Cvs 169982 weights serialization~~ [NPU] Model serialization/deserialization without weights copies Sep 24, 2025

Storing the first serializer attempt

553b23e

github-actions bot added the category: Core OpenVINO Core (aka ngraph) label Sep 25, 2025

Second attempt

f93ef2b

razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from 87a0898 to f93ef2b Compare September 25, 2025 12:58

First solution that seems to be working

63117b1

razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from ad5df75 to c37d8ae Compare October 1, 2025 14:22

Merge remote-tracking branch 'upstream/master' into CVS-169982-weight…

a2362fe

…s-serialization

razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from c37d8ae to a2362fe Compare October 2, 2025 09:44

razvanapetroaie requested a review from a team as a code owner October 9, 2025 08:55

razvanapetroaie requested review from ShaojieZhuIntel, XinWangIntel, lmielick and praasz October 9, 2025 08:58

razvanapetroaie commented Oct 9, 2025

View reviewed changes

more test tweak

0b8f541

XinWangIntel reviewed Oct 13, 2025

View reviewed changes

praasz reviewed Oct 13, 2025

View reviewed changes

src/plugins/intel_npu/src/compiler_adapter/include/vcl_serializer.hpp Show resolved Hide resolved

src/plugins/intel_npu/src/compiler_adapter/src/vcl_serializer.cpp Show resolved Hide resolved

PatrikStepan reviewed Oct 15, 2025

View reviewed changes

src/plugins/intel_npu/src/al/include/intel_npu/weights_pointer_attribute.hpp Outdated Show resolved Hide resolved

src/plugins/intel_npu/src/compiler_adapter/include/vcl_serializer.hpp Show resolved Hide resolved

razvanapetroaie added 2 commits October 15, 2025 16:00

Merge remote-tracking branch 'upstream/master' into CVS-169982-weight…

695f227

…s-serialization

just comments and attribute tags

5bba959

razvanapetroaie force-pushed the CVS-169982-weights-serialization branch from eed9db5 to 5bba959 Compare October 15, 2025 16:11

razvanapetroaie added 4 commits October 15, 2025 16:28

virtual dtor

742ee91

Basic test for weightless serializer

afca0f8

reduced copy-pasta in the "serialize_model_to_stream" functions

bf11a90

just a comment

38ca6e6

moslex added this to the 2025.4 milestone Oct 21, 2025

moslex added the Code Freeze label Oct 21, 2025

PatrikStepan removed this from the 2025.4 milestone Oct 21, 2025

PatrikStepan removed the Code Freeze label Oct 21, 2025

XinWangIntel approved these changes Oct 21, 2025

View reviewed changes

ShaojieZhuIntel approved these changes Oct 21, 2025

View reviewed changes

Reusing the weightless writer -> significant time boost if weights are

352f853

copied

praasz approved these changes Oct 24, 2025

View reviewed changes

razvanapetroaie added 4 commits October 27, 2025 13:46

Merge remote-tracking branch 'upstream/master' into CVS-169982-weight…

9fd1b2b

…s-serialization

post-merge build fix

5fdf849

ubuntu measurements

24aa96d

windows measurements

7f892aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[NPU] Model serialization/deserialization without weights copies #31939

[NPU] Model serialization/deserialization without weights copies #31939

Uh oh!

razvanapetroaie commented Sep 2, 2025 •

edited

Loading

Uh oh!

razvanapetroaie Oct 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PatrikStepan commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

[NPU] Model serialization/deserialization without weights copies #31939

Are you sure you want to change the base?

[NPU] Model serialization/deserialization without weights copies #31939

Uh oh!

Conversation

razvanapetroaie commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Related PRs

Tickets:

Uh oh!

razvanapetroaie Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PatrikStepan commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

razvanapetroaie commented Sep 2, 2025 •

edited

Loading