You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On windows, the llama.cpp binary might be in a different location (such as llama.cpp\build\bin\Release\), in which case the command mgiht be something like:
onnxruntime-genai (aka OGA) is a new framework created by Microsoft for running ONNX LLMs: https://github.com/microsoft/onnxruntime-genai/tree/main?tab=readme-ov-file
4
+
5
+
## Installation
6
+
7
+
To install:
8
+
9
+
1.`conda create -n oga-igpu python=3.9`
10
+
1.`conda activate oga-igpu`
11
+
1.`pip install -e .[llm-oga-igpu]`
12
+
- Note: don't forget the `[llm-oga-igpu]` at the end, this is what installs ort-genai
13
+
1. Get models:
14
+
- The oga-load tool can download models from Hugging Face and build ONNX files using oga model_builder. Models can be quantized and optimized for both igpu and cpu.
Copy file name to clipboardExpand all lines: docs/ort_genai_npu.md
+5-10Lines changed: 5 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,6 @@ onnxruntime-genai (aka OGA) is a new framework created by Microsoft for running
6
6
7
7
### Warnings
8
8
9
-
- Users have experienced inconsistent results across models and machines. If one model isn't working well on your laptop, try one of the other models.
10
9
- The OGA wheels need to be installed in a specific order or you will end up with the wrong packages in your environment. If you see pip dependency errors, please delete your conda env and start over with a fresh environment.
11
10
12
11
### Installation
@@ -18,20 +17,16 @@ onnxruntime-genai (aka OGA) is a new framework created by Microsoft for running
18
17
1.`cd REPO_ROOT`
19
18
1.`pip install -e .[oga-npu]`
20
19
1. Download required OGA packages
21
-
1. Access the [AMD RyzenAI EA Lounge](https://account.amd.com/en/member/ryzenai-sw-ea.html#tabs-a5e122f973-item-4757898120-tab) and download `amd_oga_Oct4_2024.zip` from `Ryzen AI 1.3 Preview Release`.
20
+
1. Access the [AMD RyzenAI EA Lounge](https://account.amd.com/en/member/ryzenai-sw-ea.html#tabs-a5e122f973-item-4757898120-tab) and download `amd_oga_Oct4_2024.zip` from `Ryzen AI 1.3 EA Release`.
22
21
1. Unzip `amd_oga_Oct4_2024.zip`
23
22
1. Setup your folder structure:
24
-
1. Copy all of the content inside `amd_oga`to lemonade's `REPO_ROOT\src\lemonade\tools\ort_genai\models\`
25
-
1.Move all dlls from `REPO_ROOT\src\lemonade\tools\ort_genai\models\libs`to `REPO_ROOT\src\lemonade\tools\ort_genai\models\`
23
+
1. Copy the `amd_oga`folder from the above zip file, if desired
24
+
1.Create the system environment variable `AMD_OGA` and set it to the path to the `amd_oga` folder
1. Ensure you have access to the models on Hungging Face:
32
-
1. Ensure you can access the models under [quark-quantized-onnx-llms-for-ryzen-ai-13-ea](https://huggingface.co/collections/amd/quark-quantized-onnx-llms-for-ryzen-ai-13-ea-66fc8e24927ec45504381902) on Hugging Face. Models are gated and you may have to request access.
33
-
1. Create a Hugging Face Access Token [here](https://huggingface.co/settings/tokens). Ensure you select `Read access to contents of all public gated repos you can access` if creating a finegrained token.
34
-
1. Set your Hugging Face token as an environment variable: `set HF_TOKEN=<your token>`
35
30
1. Install driver
36
31
1. Access the [AMD RyzenAI EA Lounge](https://account.amd.com/en/member/ryzenai-sw-ea.html#tabs-a5e122f973-item-4757898120-tab) and download `Win24AIDriver.zip` from `Ryzen AI 1.3 Preview Release`.
37
32
1. Unzip `Win24AIDriver.zip`
@@ -40,7 +35,7 @@ onnxruntime-genai (aka OGA) is a new framework created by Microsoft for running
40
35
41
36
### Runtime
42
37
43
-
To test basic functionality, point lemonade to any of the models under under [quark-quantized-onnx-llms-for-ryzen-ai-13-ea](https://huggingface.co/collections/amd/quark-quantized-onnx-llms-for-ryzen-ai-13-ea-66fc8e24927ec45504381902):
38
+
To test basic functionality, point lemonade to any of the models under [quark-quantized-onnx-llms-for-ryzen-ai-1.3-ea](https://huggingface.co/collections/amd/quark-quantized-onnx-llms-for-ryzen-ai-13-ea-66fc8e24927ec45504381902):
-[RyzenAI NPU for PyTorch](#install-ryzenai-npu-for-pytorch)
8
10
1.[Code Organization](#code-organization)
9
11
1.[Contributing](#contributing)
10
12
@@ -85,29 +87,21 @@ Lemonade supports specialized tools that each require their own setup steps. **N
85
87
86
88
## Install OnnxRuntime-GenAI
87
89
88
-
To install support for [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) (e.g., the `oga-load` Tool), use `pip install -e .[llm-oga-dml]` instead of the default installation command.
90
+
To install support for [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai), use `pip install -e .[llm-oga-dml]` instead of the default installation command.
89
91
90
-
Next, you need to get an OGA model. Per the OGA instructions, we suggest Phi-3-Mini. Use the following command to download it from Hugging Face, and make sure to set your `--local-dir` to the `REPO_ROOT/src/turnkeyml/llm/ort_genai/models` directory.
92
+
You can then load supported OGA models on to CPU or iGPU with the `oga-load` tool, for example:
> Note: early access to AMD's RyzenAI NPU is also available. See the [RyzenAI NPU OGA documentation](https://github.com/onnx/turnkeyml/blob/main/docs/ort_genai_npu.md) for more information.
`lemonade -i microsoft/Phi-3-mini-128k-instruct oga-load --device cpu --dtype int4 llm-prompt -p "Hello, my thoughts are"`
107
-
108
-
> Note: no other models or devices are officially supported by `lemonade` on OGA at this time. Contributions appreciated! It only takes a few minutes to add a new model, we just need to add a path to the downloaded model folder to the supported models dictionary in [oga.py](https://github.com/onnx/turnkeyml/blob/v4.0.2/src/turnkeyml/llm/tools/ort_genai/oga.py).
109
-
110
-
## Install RyzenAI NPU
104
+
## Install RyzenAI NPU for PyTorch
111
105
112
106
To run your LLMs on RyzenAI NPU, first install and set up the `ryzenai-transformers` conda environment (see instructions [here](https://github.com/amd/RyzenAI-SW/blob/main/example/transformers/models/llm/docs/README.md)). Then, install `lemonade` into `ryzenai-transformers`. The `ryzenai-npu-load` Tool will become available in that environment.
0 commit comments