Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 48 additions & 10 deletions modules/ollama_openvino/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -584,17 +584,17 @@ Getting started with large language models and using the [GenAI](https://github.
We provide two ways to download the executable file of Ollama, one is to download it from Google Drive, and the other is to download it from Baidu Drive:
## Google Driver
### Windows
[Download exe](https://drive.google.com/file/d/1Xo3ohbfC852KtJy_4xtn_YrYaH4Y_507/view?usp=sharing) + [Download OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250513/openvino_genai_windows_2025.2.0.0.dev20250513_x86_64.zip)
[Download exe](https://drive.google.com/file/d/12eXPdCSSNx53fmK7KnEZ3WFMSiaX2M-Y/view?usp=sharing) + [Download OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.3.0.0.dev20250630/openvino_genai_windows_2025.3.0.0.dev20250630_x86_64.zip)

### Linux(Ubuntu 22.04)
[Download](https://drive.google.com/file/d/1_P7CQqFUqeyx4q5y5bQ-xQsb10T9gzJD/view?usp=sharing) + [Donwload OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250513/openvino_genai_ubuntu22_2025.2.0.0.dev20250513_x86_64.tar.gz)
[Download](https://drive.google.com/file/d/11-Gmk9nEMsr7lrUV2E_gFOAhxXErLsoh/view?usp=sharing) + [Donwload OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.3.0.0.dev20250630/openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64.tar.gz)

## 百度云盘
### Windows
[Download exe](https://pan.baidu.com/s/1uIUjji7Mxf594CJy1vbrVw?pwd=36mq) + [Download OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250513/openvino_genai_windows_2025.2.0.0.dev20250513_x86_64.zip)
[Download exe](https://pan.baidu.com/s/1nFok-DqBy-VoiXIwghE71Q?pwd=3m2m) + [Download OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.3.0.0.dev20250630/openvino_genai_windows_2025.3.0.0.dev20250630_x86_64.zip)

### Linux(Ubuntu 22.04)
[Download](https://pan.baidu.com/s/1OCq3aKJBiCrtjLKa7kXbMw?pwd=exhz) + [Donwload OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250513/openvino_genai_ubuntu22_2025.2.0.0.dev20250513_x86_64.tar.gz)
[Download](https://pan.baidu.com/s/16roqb9JVN_k1H_fk2JFXHg?pwd=t5q7) + [Donwload OpenVINO GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.3.0.0.dev20250630/openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64.tar.gz)

## Docker
### Linux
Expand All @@ -608,7 +608,7 @@ docker run -it --rm --entrypoint /bin/bash ollama_openvino_ubuntu24:v1
```
Execute the following inside the container:
```shell
source /home/ollama_ov_server/openvino_genai_ubuntu24_2025.2.0.0.dev20250513_x86_64/setupvars.sh
source /home/ollama_ov_server/openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64/setupvars.sh
ollama serve
```

Expand Down Expand Up @@ -735,7 +735,7 @@ Let's take [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://hf-mirror.com/deeps

4. Unzip OpenVINO GenAI package and set environment
```shell
cd openvino_genai_windows_2025.2.0.0.dev20250513_x86_64
cd openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64
setupvars.bat
```

Expand All @@ -762,6 +762,44 @@ Let's take [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://hf-mirror.com/deeps
ollama run DeepSeek-R1-Distill-Qwen-7B-int4-ov:v1
```

### Import from GGUF file(Experimental feature, not recommended for production use.)

| GGUF | Model Link | GGUF Size |Precision | Status | Device |
| ------------------ | ---------- | ----- | -----------|-------------------- |----------|
| DeepSeek-R1-Distill-Qwen-1.5B-GGUF | [HuggingFace](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF), [ModelScope](https://modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF) | 1.12GB | Q4_K_M | ✔️ | CPU, GPU |
| DeepSeek-R1-Distill-Qwen-7B-GGUF | [HuggingFace](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF), [ModelScope](https://modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF) | 4.68GB | Q4_K_M | ✔️ | CPU, GPU |
| Qwen2.5-1.5B-Instruct-GGUF | [HuggingFace](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF), [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-1.5B-Instruct-GGUF) | 1.12GB | Q4_K_M | ✔️ | CPU, GPU |
| Qwen2.5-3B-Instruct-GGUF | [HuggingFace](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF), [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-3B-Instruct-GGUF) | 2.1GB | Q4_K_M | ✔️ | CPU, GPU |
| Qwen2.5-7B-Instruct-GGUF | [HuggingFace](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF), [ModelScope](https://modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct-GGUF) | 4.68GB | Q4_K_M | ✔️ | CPU, GPU |
| llama-3.2-1b-instruct-GGUF | [HuggingFace](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-GGUF), [ModelScope](https://modelscope.cn/models/unsloth/Llama-3.2-1B-Instruct-GGUF) | 0.75GB | Q4_K_M | ✔️ | CPU, GPU |
| llama-3.2-3b-instruct-GGUF | [HuggingFace](https://huggingface.co/unsloth/Llama-3.2-3B-Instruct-GGUF), [ModelScope](https://modelscope.cn/models/unsloth/Llama-3.2-3B-Instruct-GGUF) | 2.02GB | Q4_K_M | ✔️ | CPU, GPU |
| llama-3.1-8b-instruct-GGUF | [HuggingFace](https://huggingface.co/modularai/Llama-3.1-8B-Instruct-GGUF) | 4.92GB | Q4_K_M | ✔️ | CPU, GPU |

#### Example
Using the qwen2.5-3b-instruct-q4_k_m.gguf model as an example:
1. the corresponding Modelfile is as follows
```shell
FROM qwen2.5-3b-instruct-q4_k_m.gguf
ModelType "OpenVINO"
InferDevice "GPU"
PARAMETER stop "<|im_end|>"
PARAMETER repeat_penalty 1.0
PARAMETER top_p 1.0
PARAMETER temperature 1.0
```
2. Create the model in Ollama

```shell
ollama create qwen2.5-3b-gguf-ov-gpu:v1 -f Modelfile
```

3. Run the model

```shell
ollama run qwen2.5-3b-gguf-ov-gpu:v1
```
Reference link: [openvino-genai-supports-gguf-models](https://blog.openvino.ai/blog-posts/openvino-genai-supports-gguf-models)

## CLI Reference

### Show model information
Expand Down Expand Up @@ -813,9 +851,9 @@ Then build and run Ollama from the root directory of the repository:

3. Initialize the GenAI environment

Download GenAI runtime from [GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250513/openvino_genai_windows_2025.2.0.0.dev20250513_x86_64.zip), then extract it to a directory openvino_genai_windows_2025.2.0.0.dev20250513_x86_64.
Download GenAI runtime from [GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.3.0.0.dev20250630/openvino_genai_windows_2025.3.0.0.dev20250630_x86_64.zip), then extract it to a directory openvino_genai_windows_2025.3.0.0.dev20250630_x86_64.
```shell
cd openvino_genai_windows_2025.2.0.0.dev20250513_x86_64
cd openvino_genai_windows_2025.3.0.0.dev20250630_x86_64
setupvars.bat
```

Expand Down Expand Up @@ -852,9 +890,9 @@ Then build and run Ollama from the root directory of the repository:

3. Initialize the GenAI environment

Download GenAI runtime from [GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.2.0.0.dev20250513/openvino_genai_ubuntu22_2025.2.0.0.dev20250513_x86_64.tar.gz), then extract it to a directory openvino_genai_ubuntu22_2025.2.0.0.dev20250513_x86_64.
Download GenAI runtime from [GenAI](https://storage.openvinotoolkit.org/repositories/openvino_genai/packages/nightly/2025.3.0.0.dev20250630/openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64.tar.gz), then extract it to a directory openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64.
```shell
cd openvino_genai_ubuntu22_2025.2.0.0.dev20250513_x86_64
cd openvino_genai_ubuntu22_2025.3.0.0.dev20250630_x86_64
source setupvars.sh
```

Expand Down
70 changes: 70 additions & 0 deletions modules/ollama_openvino/genai/genai.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ import "C"

import (
"archive/tar"
"bufio"
"bytes"
"compress/gzip"
"fmt"
"io"
Expand Down Expand Up @@ -67,6 +69,74 @@ type SamplingParams struct {

type Model *C.ov_genai_llm_pipeline

func IsGGUF(filePath string) (bool, error) {
file, err := os.Open(filePath)
if err != nil {
return false, fmt.Errorf("failed to open file: %v", err)
}
defer file.Close()

// Read the first 4 bytes (magic number for GGUF)
reader := bufio.NewReader(file)
magicBytes := make([]byte, 4)
_, err = reader.Read(magicBytes)
if err != nil {
return false, fmt.Errorf("failed to read magic number: %v", err)
}

// Compare the magic number (GGUF in ASCII)
expectedMagic := []byte{0x47, 0x47, 0x55, 0x46} // "GGUF" in hex
for i := 0; i < 4; i++ {
if magicBytes[i] != expectedMagic[i] {
return false, nil
}
}

return true, nil
}

func IsGzipByMagicBytes(filepath string) (bool, error) {
file, err := os.Open(filepath)
if err != nil {
return false, err
}
defer file.Close()

magicBytes := make([]byte, 2)
_, err = file.Read(magicBytes)
if err != nil {
return false, err
}

return bytes.Equal(magicBytes, []byte{0x1F, 0x8B}), nil
}

func CopyFile(src, dst string) error {
srcFile, err := os.Open(src)
if err != nil {
return err
}
defer srcFile.Close()

dstFile, err := os.Create(dst)
if err != nil {
return err
}
defer dstFile.Close()

_, err = io.Copy(dstFile, srcFile)
if err != nil {
return err
}

err = dstFile.Sync()
if err != nil {
return err
}

return nil
}

func UnpackTarGz(tarGzPath string, destDir string) error {
file, err := os.Open(tarGzPath)
if err != nil {
Expand Down
54 changes: 43 additions & 11 deletions modules/ollama_openvino/genai/runner/runner.go
Original file line number Diff line number Diff line change
Expand Up @@ -381,24 +381,56 @@ func (s *Server) loadModel(mpath string, mname string, device string) {
var err error
ov_ir_dir := strings.ReplaceAll(mname, ":", "_")
tempDir := filepath.Join("/tmp", ov_ir_dir)
ov_model_path := ""

_, err = os.Stat(tempDir)
if os.IsNotExist(err) {
err = genai.UnpackTarGz(mpath, tempDir)
if err != nil {
panic(err)
isGGUF, err := genai.IsGGUF(mpath)
if err != nil {
fmt.Printf("Error checking file: %v\n", err)
panic(err)
}
if isGGUF {
log.Printf("The model is a GGUF file.")
ov_model_path = filepath.Join(tempDir, "tmp.gguf")
// for GGUF reader
if _, err := os.Stat(tempDir); os.IsNotExist(err) {
err := os.MkdirAll(tempDir, 0755)
if err != nil {
fmt.Errorf("Error creating dir: %v", err)
panic(err)
}
err = genai.CopyFile(mpath, ov_model_path)
if err != nil {
panic(err)
}
}
}

entries, err := os.ReadDir(tempDir)
var subdirs []string
for _, entry := range entries {
if entry.IsDir() {
subdirs = append(subdirs, entry.Name())
isGzip, err := genai.IsGzipByMagicBytes(mpath)
if err != nil {
fmt.Printf("Error checking file: %v\n", err)
}
if isGzip {
log.Printf("The model is a OpenVINO IR file.")
// for OpenVINO IR
_, err = os.Stat(tempDir)
if os.IsNotExist(err) {
err = genai.UnpackTarGz(mpath, tempDir)
if err != nil {
panic(err)
}
}

entries, _ := os.ReadDir(tempDir)
var subdirs []string
for _, entry := range entries {
if entry.IsDir() {
subdirs = append(subdirs, entry.Name())
}
}

ov_model_path = filepath.Join(tempDir, subdirs[0])
}

ov_model_path := filepath.Join(tempDir, subdirs[0])
s.model = genai.CreatePipeline(ov_model_path, device)
log.Printf("The model had been load by GenAI, ov_model_path: %s , %s", ov_model_path, device)
s.status = ServerStatusReady
Expand Down
Loading