Skip to content

Very long compile times for moderately sized matmuls #143

@GlowingScrewdriver

Description

@GlowingScrewdriver

TL;DR

When I construct an OpenVINO model consisting of just a single, moderately sized square matmul operation (in addition to the parameter and result ops), in order to launch on the NPU, the compile step takes really long.

The problem

I construct the model as follows:

def make_model (n):
    dtype = ov.Type ('float16')
    size = (n, n)

    A = ops.parameter (size, dtype, name = "A")
    B = ops.parameter (size, dtype, name = "B")
    C = ops.matmul (A, B, False, False)
    res = ops.result (C)

    return ov.Model ([res], [A, B], "matmul")

And I compile it as follows:

compiled_model = core.compile_model (make_model (11264), "NPU")

Some numbers:

  • n=11264 (i.e. matmul with 11264x11264 matrices) takes 10 minutes to compile
  • n=12288 takes 23 minutes
  • n=13312 takes 114 minutes

After compilation, inference (i.e. running the matmul on random inputs) is as quick as I'd expect it to be. Subsequent runs for a single matrix size spend no time on compilation, probably since they are satisfied by the NPU model cache.

I've tried raising the thread limit for the compiler, but it still runs on a single thread.

System info

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions