Skip to content

Unable to Profile DMATask on NPU40XX #43

@ColorsWind

Description

@ColorsWind

I am currently investigating the time breakdown of tasks within a single model. However, I've encountered an issue where I cannot profile the DMATask. I've spent a week trying to identify any missing information but have yet to resolve the problem. Below are the minimal steps to reproduce the issue. The platform is an Intel ultra258v laptop with openvino-nightly and numpy installed.

Step 1: Create the Sigmoid Model

import numpy as np
import openvino as ov
from openvino.runtime import op, opset1

n_len = 128
ov_type = ov.Type.f16

A = op.Parameter(ov_type, ov.Shape([n_len]))
C = opset1.sigmoid(A)
model = ov.Model(C, [A])
ov.save_model(model, "sigmoid.xml")

Step 2: Compile the Sigmoid Model with DMA Profiling Flags

vpux-translate --vpu-arch=NPU40XX \
    --vpux-profiling \
    --mlir-print-debuginfo \
    --import-IE sigmoid.xml -o sigmoid.mlir
    
vpux-opt --vpu-arch=NPU40XX \
    --default-hw-mode="profiling=true dma-profiling=true" \
    --lower-VPUIP-to-ELF sigmoid.mlir \
    -o sigmoid_out.mlir
vpux-translate --vpu-arch=NPU40XX --export-ELF sigmoid_out.mlir -o sigmoid.blob

Step 3: Run the Model

import os
os.environ['ZE_INTEL_NPU_LOGLEVEL'] = 'ERROR'

import openvino as ov
import numpy as np

core = ov.Core()
core.set_property('NPU', {
    'PERF_COUNT': True,
})
with open('sigmoid.blob', 'rb') as f:
    blob = f.read()
model = core.import_model(blob, device_name='NPU')
req = model.create_infer_request()
req.infer(np.random.random(128).astype(np.float16))
prof_info = req.profiling_info[0]
print('status', prof_info.status)
print('real_time', prof_info.real_time)
print('cpu_time', prof_info.cpu_time)
print('node_name', prof_info.node_name)
print('exec_type', prof_info.exec_type)
print('node_type', prof_info.node_type)

You will encounter the following error:

NPU_LOG: ERROR [compiler.cpp:289] Failed to get decoded profiling data in compiler

In fact, when I try using the Level Zero API to submit tasks directly and then use the Level Zero NPU extension API to get profiling results, I find that the DMA profiling output data is zero, while the DPU and SW-Kernel profiling outputs are valid. I am wondering how I can obtain the DMA task profiling results.

Metadata

Metadata

Labels

bugSomething isn't workinghelp wantedExtra attention is needed

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions