-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
TL;DR
When I construct an OpenVINO model consisting of just a single, moderately sized square matmul operation (in addition to the parameter and result ops), in order to launch on the NPU, the compile step takes really long.
The problem
I construct the model as follows:
def make_model (n):
dtype = ov.Type ('float16')
size = (n, n)
A = ops.parameter (size, dtype, name = "A")
B = ops.parameter (size, dtype, name = "B")
C = ops.matmul (A, B, False, False)
res = ops.result (C)
return ov.Model ([res], [A, B], "matmul")
And I compile it as follows:
compiled_model = core.compile_model (make_model (11264), "NPU")
Some numbers:
- n=11264 (i.e. matmul with 11264x11264 matrices) takes 10 minutes to compile
- n=12288 takes 23 minutes
- n=13312 takes 114 minutes
After compilation, inference (i.e. running the matmul on random inputs) is as quick as I'd expect it to be. Subsequent runs for a single matrix size spend no time on compilation, probably since they are satisfied by the NPU model cache.
I've tried raising the thread limit for the compiler, but it still runs on a single thread.
System info
- OpenVINO 2025.2.0 and OpenVINO 2025.3.0.dev20250729 (nightly)
- Linux NPU Driver v1.19.0, which I understand includes the NPU Compiler 2025.24
- Ubuntu 24.10 Oracular, with Linux 6.11.29
- NPU device architecture: 4000
Metadata
Metadata
Assignees
Labels
No labels