-
Notifications
You must be signed in to change notification settings - Fork 469
Description
Description
CPU will always load. When using fallback set to false and skipcheck() it will crash and throw the native api exception. Forcing CUDA does not work.
Temp Solution:
Use NextCoder model instead of my Qwen3 model and use version 0.23.0 of llamasharp with the cuda backend.
Llama loads up and shows backend_ptrs.size() = 2. This normally shows 1 when using 0.24.0.
Problem comes from 0.24.0 specifically.
Strange part:
I have 0.24.0 CUDA working on a MAUI application that is available on the Microsoft store so I know 0.24.0 works for CUDA but it uses a custom library that doesnt seem to work for the console/API server application.
Changing the targeted type from Any CPU to x64 is required to find the backend .dlls as well so x86 does not work correctly.
Reproduction Steps
Attempt to use Qwen3 model with latest llamasharp cuda backend.
Create a .NET 9 console application or API server and reference the latest CUDA backend.
Load the weights and look at where its assigned.
Environment & Configuration
- Operating system: Windows 11
- .NET runtime version: .NET 9
- LLamaSharp version: 0.24.0
- CUDA version (if you are using cuda backend): 12.6 & 12.9
- CPU & GPU device: RTX 3090
Known Workarounds
No response