You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
onnxruntime-genai (aka OGA) is a new framework created by Microsoft for running ONNX LLMs: https://github.com/microsoft/onnxruntime-genai/tree/main?tab=readme-ov-file
4
+
5
+
## NPU instructions
6
+
7
+
### Warnings
8
+
9
+
- Users have experienced inconsistent results across models and machines. If one model isn't working well on your laptop, try one of the other models.
10
+
- The OGA wheels need to be installed in a specific order or you will end up with the wrong packages in your environment. If you see pip dependency errors, please delete your conda env and start over with a fresh environment.
11
+
12
+
### Installation
13
+
14
+
1. NOTE: ⚠️ DO THESE STEPS IN EXACTLY THIS ORDER ⚠️
15
+
1. Install `lemonade`:
16
+
1. Create a conda environment: `conda create -n oga-npu python=3.10` (Python 3.10 is required)
17
+
1. Activate: `conda activate oga-npu`
18
+
1.`cd REPO_ROOT`
19
+
1.`pip install -e .[oga-npu]`
20
+
1. Download required OGA packages
21
+
1. Access the [AMD RyzenAI EA Lounge](https://account.amd.com/en/member/ryzenai-sw-ea.html#tabs-a5e122f973-item-4757898120-tab) and download `amd_oga_Oct4_2024.zip` from `Ryzen AI 1.3 Preview Release`.
22
+
1. Unzip `amd_oga_Oct4_2024.zip`
23
+
1. Setup your folder structure:
24
+
1. Copy all of the content inside `amd_oga` to lemonade's `REPO_ROOT\src\lemonade\tools\ort_genai\models\`
25
+
1. Move all dlls from `REPO_ROOT\src\lemonade\tools\ort_genai\models\libs` to `REPO_ROOT\src\lemonade\tools\ort_genai\models\`
1. Ensure you have access to the models on Hungging Face:
32
+
1. Ensure you can access the models under [quark-quantized-onnx-llms-for-ryzen-ai-13-ea](https://huggingface.co/collections/amd/quark-quantized-onnx-llms-for-ryzen-ai-13-ea-66fc8e24927ec45504381902) on Hugging Face. Models are gated and you may have to request access.
33
+
1. Create a Hugging Face Access Token [here](https://huggingface.co/settings/tokens). Ensure you select `Read access to contents of all public gated repos you can access` if creating a finegrained token.
34
+
1. Set your Hugging Face token as an environment variable: `set HF_TOKEN=<your token>`
35
+
1. Install driver
36
+
1. Access the [AMD RyzenAI EA Lounge](https://account.amd.com/en/member/ryzenai-sw-ea.html#tabs-a5e122f973-item-4757898120-tab) and download `Win24AIDriver.zip` from `Ryzen AI 1.3 Preview Release`.
37
+
1. Unzip `Win24AIDriver.zip`
38
+
1. Right click `kipudrv.inf` and select `Install`
39
+
1. Check under `Device Manager` to ensure that `NPU Compute Accelerator` is using version `32.0.203.219`.
40
+
41
+
### Runtime
42
+
43
+
To test basic functionality, point lemonade to any of the models under under [quark-quantized-onnx-llms-for-ryzen-ai-13-ea](https://huggingface.co/collections/amd/quark-quantized-onnx-llms-for-ryzen-ai-13-ea-66fc8e24927ec45504381902):
0 commit comments