Ollama-openvino uses plain CPU instead of NPU/GPU on Intel Core Ultra 9 185H Linux system

Problem: I can run ollama-openvino in provided docker image, but it runs the model using just the CPU, failing to utilize the Intel Core Ultra CPU's NPU or GPU.

Request accordingly: 
- Please help to resolve how to utilize Intel's NPU or GPU with ollama-openvino
- or, provide information of how I can inspect the issue further
- or, provide information if the ollama-openvino is not compatible with Intel Core Ultra 9 processors, or linux systems, or can't use HW acceleration for some other reason

--

Details:

I am using the ollama_openvino docker image built using DockerFile provided on the Ollama-ov page:
https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/ollama_openvino

I run this on docker in a native Linux PC running on Intel Core Ultra 9 185H processor with 64GiB of RAM.

I have tested with "DeepSeek-R1-Distill-Qwen-14B-int4-ov" and few other models: They run, but the system seems to use CPU instead of NPU or GPU, these deduced from:
- 'top' showing CPU load of ~600% when the model is working, indicating that it uses lots of CPU
- 'nvtop' shows very low GPU activity
- 'nputop' shows no NPU activity at all

`sudo dmesg | grep -e i915 -e vpu` indicate that Intel vpu & gpu were detected (dmesg run in the host system, not inside docker):
```
[    2.179935] intel_vpu 0000:00:0b.0: enabling device (0000 -> 0002)
[    2.205297] intel_vpu 0000:00:0b.0: [drm] Firmware: intel/vpu/vpu_37xx_v1.bin, version: 20250415*MTL_CLIENT_SILICON-release*1900*ci_tag_ud202518_vpu_rc_20250415_1900*7ef0f3fdb82
[    2.205300] intel_vpu 0000:00:0b.0: [drm] Scheduler mode: HW
[    2.298983] [drm] Initialized intel_vpu 1.0.0 for 0000:00:0b.0 on minor 0
[    2.887553] i915 0000:00:02.0: [drm] Found meteorlake (device ID 7d55) integrated display version 14.00 stepping C0
[    2.888710] i915 0000:00:02.0: [drm] VT-d active for gfx access
[    2.916931] i915 0000:00:02.0: vgaarb: deactivate vga console
[    2.916960] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[    2.929356] i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    2.945352] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/mtl_dmc.bin (v2.23)
[    2.958531] i915 0000:00:02.0: [drm] [CONNECTOR:241:eDP-1] Panel is missing HDR static metadata. Possible support for Intel HDR backlight interface is not used. If your backlight controls don't work try booting with i915.enable_dpcd_backlight=3. needs this, please file a _new_ bug report on drm/i915, see https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html for details.
[    3.047743] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/mtl_guc_70.bin version 70.44.1
[    3.059917] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[    3.059920] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[    3.060126] i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
[    3.069357] i915 0000:00:02.0: [drm] GT1: GuC firmware i915/mtl_guc_70.bin version 70.44.1
[    3.069360] i915 0000:00:02.0: [drm] GT1: HuC firmware i915/mtl_huc_gsc.bin version 8.5.4
[    3.094860] i915 0000:00:02.0: [drm] GT1: HuC: authenticated for clear media
[    3.095313] i915 0000:00:02.0: [drm] GT1: GUC: submission enabled
[    3.095314] i915 0000:00:02.0: [drm] GT1: GUC: SLPC enabled
[    3.095402] i915 0000:00:02.0: [drm] GT1: GUC: RC enabled
[    3.100012] i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
[    3.113102] [drm] Initialized i915 1.6.0 for 0000:00:02.0 on minor 1
[    3.238197] i915 0000:00:02.0: [drm] GT1: Loaded GSC firmware i915/mtl_gsc_1.bin (cv1.0, r102.1.15.1926, svn 1)
[    3.258552] i915 0000:00:02.0: [drm] GT1: HuC: authenticated for all workloads
[    3.352911] fbcon: i915drmfb (fb0) is primary device
[    3.352914] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[    4.810133] mei_gsc_proxy 0000:00:16.0-0f73db04-97ab-4125-b893-e904ad0d5464: bound 0000:00:02.0 (ops i915_gsc_proxy_component_ops [i915])
[    4.944946] i915 0000:00:02.0: [drm] Selective fetch area calculation failed in pipe A
[    5.040641] sof-audio-pci-intel-mtl 0000:00:1f.3: bound 0000:00:02.0 (ops intel_audio_component_bind_ops [i915])
```

Ollama serve console says following:
```
time=2025-08-17T14:14:25.853Z level=INFO source=sched.go:313 msg="loading first openvino model: DeepSeek-R1-Distill-Qwen-14B-int4-ov:latest"
time=2025-08-17T14:14:25.853Z level=INFO source=genaiserver.go:147 msg="system memory" total="62.2 GiB" free="55.1 GiB" free_swap="8.0 GiB"
time=2025-08-17T14:14:25.856Z level=INFO source=genaiserver.go:105 msg="The device specified in the modelfile is not currently supported by GenAI. Now we use CPU"
time=2025-08-17T14:14:25.857Z level=INFO source=genaiserver.go:270 msg="starting llama server" cmd="/usr/bin/ollama genairunner --model /root/.ollama/models/blobs/sha256-0609c3cb63c6
b5e0bc06af3610bd605c34f41d9932744aca365fb781d183d633 --modelname DeepSeek-R1-Distill-Qwen-14B-int4-ov:latest --device CPU --parallel 1 --port 38317"
time=2025-08-17T14:14:25.857Z level=INFO source=sched.go:548 msg="loaded runners" count=1                                                                                             
time=2025-08-17T14:14:25.857Z level=INFO source=genaiserver.go:389 msg="waiting for llama runner to start responding"
time=2025-08-17T14:14:25.857Z level=INFO source=genaiserver.go:423 msg="waiting for server to become available" status="llm server error"
time=2025-08-17T14:14:25.865Z level=INFO source=runner.go:473 msg="starting go genairunner" 
time=2025-08-17T14:14:25.865Z level=INFO source=runner.go:413 msg="The model is a OpenVINO IR file."
time=2025-08-17T14:14:25.865Z level=INFO source=runner.go:505 msg="Server listening on 127.0.0.1:38317"
time=2025-08-17T14:14:26.109Z level=INFO source=genaiserver.go:423 msg="waiting for server to become available" status="llm server loading model"
time=2025-08-17T14:15:25.628Z level=INFO source=runner.go:435 msg="The model had been load by GenAI, ov_model_path: /tmp/DeepSeek-R1-Distill-Qwen-14B-int4-ov_latest/DeepSeek-R1-Disti
ll-Qwen-14B-int4-ov , CPU"
time=2025-08-17T14:15:25.848Z level=INFO source=genaiserver.go:428 msg="llama runner started in 59.99 seconds"
...
time=2025-08-17T14:16:56.176Z level=INFO source=genai.go:253 msg="Sampling Parameters - Temperature: 1.00, TopP: 1.00, TopK: 40, RepeatPenalty: 1.00"             
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:208 msg="Genai Metrics info:"                                                                                                
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:212 msg="Load time: 1941.00"                                                                                                 
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:217 msg="Generate time: 208454.66 _ 0.00 ms"                                                                                 
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:222 msg="Tokenization time: 1.72 _ 0.00 ms"                                                                                  
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:227 msg="Detokenization time: 0.14 _ 0.00 ms"                                   
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:232 msg="TTFT: 3648.45 _ 0.00 ms"                                                                                            
time=2025-08-17T14:20:25.038Z level=INFO source=genai.go:237 msg="TPOT: 153.37 _ 8.56 ms/token"     
time=2025-08-17T14:20:25.039Z level=INFO source=genai.go:241 msg="Num of generation tokens: 1339"                                                                                     
time=2025-08-17T14:20:25.039Z level=INFO source=genai.go:246 msg="Throughput: 6.52 _ 0.36 tokens/s" 
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ollama-openvino uses plain CPU instead of NPU/GPU on Intel Core Ultra 9 185H Linux system #988

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ollama-openvino uses plain CPU instead of NPU/GPU on Intel Core Ultra 9 185H Linux system #988

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions