Skip to content

Ollama-openvino uses plain CPU instead of NPU/GPU on Intel Core Ultra 9 185H Linux system #988

@oparviai

Description

@oparviai

Problem: I can run ollama-openvino in provided docker image, but it runs the model using just the CPU, failing to utilize the Intel Core Ultra CPU's NPU or GPU.

Request accordingly:

  • Please help to resolve how to utilize Intel's NPU or GPU with ollama-openvino
  • or, provide information of how I can inspect the issue further
  • or, provide information if the ollama-openvino is not compatible with Intel Core Ultra 9 processors, or linux systems, or can't use HW acceleration for some other reason

--

Details:

I am using the ollama_openvino docker image built using DockerFile provided on the Ollama-ov page:
https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino

I run this on docker in a native Linux PC running on Intel Core Ultra 9 185H processor with 64GiB of RAM.

I have tested with "DeepSeek-R1-Distill-Qwen-14B-int4-ov" and few other models: They run, but the system seems to use CPU instead of NPU or GPU, these deduced from:

  • 'top' showing CPU load of ~600% when the model is working, indicating that it uses lots of CPU
  • 'nvtop' shows very low GPU activity
  • 'nputop' shows no NPU activity at all

sudo dmesg | grep -e i915 -e vpu indicate that Intel vpu & gpu were detected (dmesg run in the host system, not inside docker):

[    2.179935] intel_vpu 0000:00:0b.0: enabling device (0000 -> 0002)
[    2.205297] intel_vpu 0000:00:0b.0: [drm] Firmware: intel/vpu/vpu_37xx_v1.bin, version: 20250415*MTL_CLIENT_SILICON-release*1900*ci_tag_ud202518_vpu_rc_20250415_1900*7ef0f3fdb82
[    2.205300] intel_vpu 0000:00:0b.0: [drm] Scheduler mode: HW
[    2.298983] [drm] Initialized intel_vpu 1.0.0 for 0000:00:0b.0 on minor 0
[    2.887553] i915 0000:00:02.0: [drm] Found meteorlake (device ID 7d55) integrated display version 14.00 stepping C0
[    2.888710] i915 0000:00:02.0: [drm] VT-d active for gfx access
[    2.916931] i915 0000:00:02.0: vgaarb: deactivate vga console
[    2.916960] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[    2.929356] i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    2.945352] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/mtl_dmc.bin (v2.23)
[    2.958531] i915 0000:00:02.0: [drm] [CONNECTOR:241:eDP-1] Panel is missing HDR static metadata. Possible support for Intel HDR backlight interface is not used. If your backlight controls don't work try booting with i915.enable_dpcd_backlight=3. needs this, please file a _new_ bug report on drm/i915, see https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html for details.
[    3.047743] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/mtl_guc_70.bin version 70.44.1
[    3.059917] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[    3.059920] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[    3.060126] i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
[    3.069357] i915 0000:00:02.0: [drm] GT1: GuC firmware i915/mtl_guc_70.bin version 70.44.1
[    3.069360] i915 0000:00:02.0: [drm] GT1: HuC firmware i915/mtl_huc_gsc.bin version 8.5.4
[    3.094860] i915 0000:00:02.0: [drm] GT1: HuC: authenticated for clear media
[    3.095313] i915 0000:00:02.0: [drm] GT1: GUC: submission enabled
[    3.095314] i915 0000:00:02.0: [drm] GT1: GUC: SLPC enabled
[    3.095402] i915 0000:00:02.0: [drm] GT1: GUC: RC enabled
[    3.100012] i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
[    3.113102] [drm] Initialized i915 1.6.0 for 0000:00:02.0 on minor 1
[    3.238197] i915 0000:00:02.0: [drm] GT1: Loaded GSC firmware i915/mtl_gsc_1.bin (cv1.0, r102.1.15.1926, svn 1)
[    3.258552] i915 0000:00:02.0: [drm] GT1: HuC: authenticated for all workloads
[    3.352911] fbcon: i915drmfb (fb0) is primary device
[    3.352914] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[    4.810133] mei_gsc_proxy 0000:00:16.0-0f73db04-97ab-4125-b893-e904ad0d5464: bound 0000:00:02.0 (ops i915_gsc_proxy_component_ops [i915])
[    4.944946] i915 0000:00:02.0: [drm] Selective fetch area calculation failed in pipe A
[    5.040641] sof-audio-pci-intel-mtl 0000:00:1f.3: bound 0000:00:02.0 (ops intel_audio_component_bind_ops [i915])

Ollama serve console says following:

time=2025-08-17T14:14:25.853Z level=INFO source=sched.go:313 msg="loading first openvino model: DeepSeek-R1-Distill-Qwen-14B-int4-ov:latest"
time=2025-08-17T14:14:25.853Z level=INFO source=genaiserver.go:147 msg="system memory" total="62.2 GiB" free="55.1 GiB" free_swap="8.0 GiB"
time=2025-08-17T14:14:25.856Z level=INFO source=genaiserver.go:105 msg="The device specified in the modelfile is not currently supported by GenAI. Now we use CPU"
time=2025-08-17T14:14:25.857Z level=INFO source=genaiserver.go:270 msg="starting llama server" cmd="/usr/bin/ollama genairunner --model /root/.ollama/models/blobs/sha256-0609c3cb63c6
b5e0bc06af3610bd605c34f41d9932744aca365fb781d183d633 --modelname DeepSeek-R1-Distill-Qwen-14B-int4-ov:latest --device CPU --parallel 1 --port 38317"
time=2025-08-17T14:14:25.857Z level=INFO source=sched.go:548 msg="loaded runners" count=1                                                                                             
time=2025-08-17T14:14:25.857Z level=INFO source=genaiserver.go:389 msg="waiting for llama runner to start responding"
time=2025-08-17T14:14:25.857Z level=INFO source=genaiserver.go:423 msg="waiting for server to become available" status="llm server error"
time=2025-08-17T14:14:25.865Z level=INFO source=runner.go:473 msg="starting go genairunner" 
time=2025-08-17T14:14:25.865Z level=INFO source=runner.go:413 msg="The model is a OpenVINO IR file."
time=2025-08-17T14:14:25.865Z level=INFO source=runner.go:505 msg="Server listening on 127.0.0.1:38317"
time=2025-08-17T14:14:26.109Z level=INFO source=genaiserver.go:423 msg="waiting for server to become available" status="llm server loading model"
time=2025-08-17T14:15:25.628Z level=INFO source=runner.go:435 msg="The model had been load by GenAI, ov_model_path: /tmp/DeepSeek-R1-Distill-Qwen-14B-int4-ov_latest/DeepSeek-R1-Disti
ll-Qwen-14B-int4-ov , CPU"
time=2025-08-17T14:15:25.848Z level=INFO source=genaiserver.go:428 msg="llama runner started in 59.99 seconds"
...
time=2025-08-17T14:16:56.176Z level=INFO source=genai.go:253 msg="Sampling Parameters - Temperature: 1.00, TopP: 1.00, TopK: 40, RepeatPenalty: 1.00"             
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:208 msg="Genai Metrics info:"                                                                                                
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:212 msg="Load time: 1941.00"                                                                                                 
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:217 msg="Generate time: 208454.66 _ 0.00 ms"                                                                                 
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:222 msg="Tokenization time: 1.72 _ 0.00 ms"                                                                                  
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:227 msg="Detokenization time: 0.14 _ 0.00 ms"                                   
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:232 msg="TTFT: 3648.45 _ 0.00 ms"                                                                                            
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:237 msg="TPOT: 153.37 _ 8.56 ms/token"     
time=2025-08-17T14:20:25.039Z level=INFO source=genai.go:241 msg="Num of generation tokens: 1339"                                                                                     
time=2025-08-17T14:20:25.039Z level=INFO source=genai.go:246 msg="Throughput: 6.52 _ 0.36 tokens/s" 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions