-
Notifications
You must be signed in to change notification settings - Fork 163
Description
Problem: I can run ollama-openvino in provided docker image, but it runs the model using just the CPU, failing to utilize the Intel Core Ultra CPU's NPU or GPU.
Request accordingly:
- Please help to resolve how to utilize Intel's NPU or GPU with ollama-openvino
- or, provide information of how I can inspect the issue further
- or, provide information if the ollama-openvino is not compatible with Intel Core Ultra 9 processors, or linux systems, or can't use HW acceleration for some other reason
--
Details:
I am using the ollama_openvino docker image built using DockerFile provided on the Ollama-ov page:
https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino
I run this on docker in a native Linux PC running on Intel Core Ultra 9 185H processor with 64GiB of RAM.
I have tested with "DeepSeek-R1-Distill-Qwen-14B-int4-ov" and few other models: They run, but the system seems to use CPU instead of NPU or GPU, these deduced from:
- 'top' showing CPU load of ~600% when the model is working, indicating that it uses lots of CPU
- 'nvtop' shows very low GPU activity
- 'nputop' shows no NPU activity at all
sudo dmesg | grep -e i915 -e vpu indicate that Intel vpu & gpu were detected (dmesg run in the host system, not inside docker):
[ 2.179935] intel_vpu 0000:00:0b.0: enabling device (0000 -> 0002)
[ 2.205297] intel_vpu 0000:00:0b.0: [drm] Firmware: intel/vpu/vpu_37xx_v1.bin, version: 20250415*MTL_CLIENT_SILICON-release*1900*ci_tag_ud202518_vpu_rc_20250415_1900*7ef0f3fdb82
[ 2.205300] intel_vpu 0000:00:0b.0: [drm] Scheduler mode: HW
[ 2.298983] [drm] Initialized intel_vpu 1.0.0 for 0000:00:0b.0 on minor 0
[ 2.887553] i915 0000:00:02.0: [drm] Found meteorlake (device ID 7d55) integrated display version 14.00 stepping C0
[ 2.888710] i915 0000:00:02.0: [drm] VT-d active for gfx access
[ 2.916931] i915 0000:00:02.0: vgaarb: deactivate vga console
[ 2.916960] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[ 2.929356] i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[ 2.945352] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/mtl_dmc.bin (v2.23)
[ 2.958531] i915 0000:00:02.0: [drm] [CONNECTOR:241:eDP-1] Panel is missing HDR static metadata. Possible support for Intel HDR backlight interface is not used. If your backlight controls don't work try booting with i915.enable_dpcd_backlight=3. needs this, please file a _new_ bug report on drm/i915, see https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html for details.
[ 3.047743] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/mtl_guc_70.bin version 70.44.1
[ 3.059917] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[ 3.059920] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[ 3.060126] i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
[ 3.069357] i915 0000:00:02.0: [drm] GT1: GuC firmware i915/mtl_guc_70.bin version 70.44.1
[ 3.069360] i915 0000:00:02.0: [drm] GT1: HuC firmware i915/mtl_huc_gsc.bin version 8.5.4
[ 3.094860] i915 0000:00:02.0: [drm] GT1: HuC: authenticated for clear media
[ 3.095313] i915 0000:00:02.0: [drm] GT1: GUC: submission enabled
[ 3.095314] i915 0000:00:02.0: [drm] GT1: GUC: SLPC enabled
[ 3.095402] i915 0000:00:02.0: [drm] GT1: GUC: RC enabled
[ 3.100012] i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
[ 3.113102] [drm] Initialized i915 1.6.0 for 0000:00:02.0 on minor 1
[ 3.238197] i915 0000:00:02.0: [drm] GT1: Loaded GSC firmware i915/mtl_gsc_1.bin (cv1.0, r102.1.15.1926, svn 1)
[ 3.258552] i915 0000:00:02.0: [drm] GT1: HuC: authenticated for all workloads
[ 3.352911] fbcon: i915drmfb (fb0) is primary device
[ 3.352914] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[ 4.810133] mei_gsc_proxy 0000:00:16.0-0f73db04-97ab-4125-b893-e904ad0d5464: bound 0000:00:02.0 (ops i915_gsc_proxy_component_ops [i915])
[ 4.944946] i915 0000:00:02.0: [drm] Selective fetch area calculation failed in pipe A
[ 5.040641] sof-audio-pci-intel-mtl 0000:00:1f.3: bound 0000:00:02.0 (ops intel_audio_component_bind_ops [i915])
Ollama serve console says following:
time=2025-08-17T14:14:25.853Z level=INFO source=sched.go:313 msg="loading first openvino model: DeepSeek-R1-Distill-Qwen-14B-int4-ov:latest"
time=2025-08-17T14:14:25.853Z level=INFO source=genaiserver.go:147 msg="system memory" total="62.2 GiB" free="55.1 GiB" free_swap="8.0 GiB"
time=2025-08-17T14:14:25.856Z level=INFO source=genaiserver.go:105 msg="The device specified in the modelfile is not currently supported by GenAI. Now we use CPU"
time=2025-08-17T14:14:25.857Z level=INFO source=genaiserver.go:270 msg="starting llama server" cmd="/usr/bin/ollama genairunner --model /root/.ollama/models/blobs/sha256-0609c3cb63c6
b5e0bc06af3610bd605c34f41d9932744aca365fb781d183d633 --modelname DeepSeek-R1-Distill-Qwen-14B-int4-ov:latest --device CPU --parallel 1 --port 38317"
time=2025-08-17T14:14:25.857Z level=INFO source=sched.go:548 msg="loaded runners" count=1
time=2025-08-17T14:14:25.857Z level=INFO source=genaiserver.go:389 msg="waiting for llama runner to start responding"
time=2025-08-17T14:14:25.857Z level=INFO source=genaiserver.go:423 msg="waiting for server to become available" status="llm server error"
time=2025-08-17T14:14:25.865Z level=INFO source=runner.go:473 msg="starting go genairunner"
time=2025-08-17T14:14:25.865Z level=INFO source=runner.go:413 msg="The model is a OpenVINO IR file."
time=2025-08-17T14:14:25.865Z level=INFO source=runner.go:505 msg="Server listening on 127.0.0.1:38317"
time=2025-08-17T14:14:26.109Z level=INFO source=genaiserver.go:423 msg="waiting for server to become available" status="llm server loading model"
time=2025-08-17T14:15:25.628Z level=INFO source=runner.go:435 msg="The model had been load by GenAI, ov_model_path: /tmp/DeepSeek-R1-Distill-Qwen-14B-int4-ov_latest/DeepSeek-R1-Disti
ll-Qwen-14B-int4-ov , CPU"
time=2025-08-17T14:15:25.848Z level=INFO source=genaiserver.go:428 msg="llama runner started in 59.99 seconds"
...
time=2025-08-17T14:16:56.176Z level=INFO source=genai.go:253 msg="Sampling Parameters - Temperature: 1.00, TopP: 1.00, TopK: 40, RepeatPenalty: 1.00"
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:208 msg="Genai Metrics info:"
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:212 msg="Load time: 1941.00"
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:217 msg="Generate time: 208454.66 _ 0.00 ms"
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:222 msg="Tokenization time: 1.72 _ 0.00 ms"
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:227 msg="Detokenization time: 0.14 _ 0.00 ms"
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:232 msg="TTFT: 3648.45 _ 0.00 ms"
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:237 msg="TPOT: 153.37 _ 8.56 ms/token"
time=2025-08-17T14:20:25.039Z level=INFO source=genai.go:241 msg="Num of generation tokens: 1339"
time=2025-08-17T14:20:25.039Z level=INFO source=genai.go:246 msg="Throughput: 6.52 _ 0.36 tokens/s"