Skip to content

Commit adbc520

Browse files
jeremyfowersamd-pworfolkdanielholanda
authored
Release v6.0.2 (#294)
- Add the "echo" parameter to OpenAI completions (@danielholanda) - New dedicated report tool for LLM CSVs, as well as ASCII tables (@amd-pworfolk) - Properly raise and transmit server model load failures (@jeremyfowers) - Add documentation for Lemonade_Server_Installer.exe (@jeremyfowers) - Add telemetry to server: performance, input tokens, output tokens, and prompt tracing (@danielholanda) - Ensure that Ryzen AI Hybrid support is not installed on incompatible devices (@danielholanda) Co-authored-by: amd-pworfolk <[email protected]> Co-authored-by: Daniel Holanda <[email protected]>
1 parent cf7f1c6 commit adbc520

File tree

13 files changed

+1224
-15
lines changed

13 files changed

+1224
-15
lines changed

docs/lemonade/getting_started.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,10 @@ To install `lemonade` from source code:
5858
1. Follow the same instructions as in the [PyPI installation](#from-pypi), except replace the `turnkeyml` with a `.`.
5959
- For example: `pip install -e .[llm-oga-igpu]`
6060

61+
## From Lemonade_Server_Installer.exe
62+
63+
The `lemonade` server is available as a standalone tool with a one-click Windows installer `.exe`. Check out the [Lemonade_Server_Installer.exe guide](lemonade_server_exe.md) for installation instructions and the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the functionality.
64+
6165
# CLI Commands
6266

6367
The `lemonade` CLI uses a unique command syntax that enables convenient interoperability between models, frameworks, devices, accuracy tests, and deployment options.
@@ -135,6 +139,14 @@ That command will run a few warmup iterations, then a few generation iterations
135139
136140
The prompt size, number of output tokens, and number iterations are all parameters. Learn more by running `lemonade oga-bench -h` or `lemonade huggingface-bench -h`.
137141
142+
## LLM Report
143+
144+
To see a report that contains all the benchmarking results and all the accuracy results, use the `report` tool with the `--perf` flag:
145+
146+
`lemonade report --perf`
147+
148+
The results can be filtered by model name, device type and data type. See how by running `lemonade report -h`.
149+
138150
## Memory Usage
139151
140152
The peak memory used by the `lemonade` build is captured in the build output. To capture more granular
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Lemonade Server Installer
2+
3+
The `lemonade` server is available as a standalone tool with a one-click Windows installer `.exe`. Check out the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the functionality.
4+
5+
## GUI Installation and Usage
6+
7+
> *Note:* you may need to give your browser or OS permission to download or install the .exe.
8+
9+
1. Navigate to the [latest release](https://github.com/onnx/turnkeyml/releases/latest).
10+
1. Scroll to the bottom and click `Lemonade_Server_Installer.exe` to download.
11+
1. Double-click the `Lemonade_Server_Installer.exe` and follow the instructions.
12+
13+
Now that you have the server installed, you can double click the desktop shortcut to run the server process. From there, you can connect it to applications that are compatible with the OpenAI completions API.
14+
15+
## Silent Installation and Command Line Usage
16+
17+
Silent installation and command line usage are useful if you want to fully integrate `lemonade` server into your own application. This guide provides fully automated steps for downloading, installing, and running `lemonade` server so that your users don't have to install `lemonade` separately.
18+
19+
Definitions:
20+
- "Silent installation" refers to an automatic command for installing `lemonade` server without running any GUI or prompting the user for any questions. It does assume that the end-user fully accepts the license terms, so be sure that your own application makes this clear to the user.
21+
- Command line usage allows the server process to be launched programmatically, so that your application can manage starting and stopping the server process on your user's behalf.
22+
23+
### Download
24+
25+
Follow these instructions to download a copy of `Lemonade_Server_Installer.exe`.
26+
27+
#### cURL Download
28+
29+
In a `bash` terminal, such as `git bash`:
30+
31+
Download the latest version:
32+
33+
```bash
34+
curl -L -o ".\Lemonade_Server_Installer.exe" https://github.com/onnx/turnkeyml/releases/latest/download/Lemonade_Server_Installer.exe
35+
```
36+
37+
Download a specific version:
38+
39+
```bash
40+
curl -L -o ".\Lemonade_Server_Installer.exe" https://github.com/onnx/turnkeyml/releases/download/v6.0.0/Lemonade_Server_Installer.exe
41+
```
42+
43+
#### PowerShell Download
44+
45+
In a powershell terminal:
46+
47+
Download the latest version:
48+
49+
```powershell
50+
Invoke-WebRequest -Uri "https://github.com/onnx/turnkeyml/releases/latest/download/Lemonade_Server_Installer.exe" -OutFile "Lemonade_Server_Installer.exe"
51+
```
52+
53+
Download a specific version:
54+
55+
```powershell
56+
Invoke-WebRequest -Uri "https://github.com/onnx/turnkeyml/releases/download/v6.0.0/Lemonade_Server_Installer.exe" -OutFile "Lemonade_Server_Installer.exe"
57+
```
58+
59+
### Silent Installation
60+
61+
Silent installation runs `Lemonade_Server_Installer.exe` without a GUI and automatically accepts all prompts.
62+
63+
In a `cmd.exe` terminal:
64+
65+
Install *with* Ryzen AI hybrid support:
66+
67+
```bash
68+
Lemonade_Server_Installer.exe /S /Extras=hybrid
69+
```
70+
71+
Install *without* Ryzen AI hybrid support:
72+
73+
```bash
74+
Lemonade_Server_Installer.exe /S
75+
```
76+
77+
The install directory can also be changed from the default by using `/D` as the last argument.
78+
79+
For example:
80+
81+
```bash
82+
Lemonade_Server_Installer.exe /S /Extras=hybrid /D=C:\a\new\path`
83+
```
84+
85+
### Command Line Invocation
86+
87+
Command line invocation starts the `lemonade` server process so that your application can connect to it via REST API endpoints.
88+
89+
#### Foreground Process
90+
91+
These steps will open lemonade server in a terminal window that is visible to users. The user can exit the server by closing the window.
92+
93+
In a `cmd.exe` terminal:
94+
95+
```bash
96+
conda run --no-capture-output -p INSTALL_DIR\lemonade_server\lemon_env lemonade serve
97+
```
98+
99+
Where `INSTALL_DIR` is the installation path of `lemonade_server`.
100+
101+
For example, if you used the default installation directory and your username is USERNAME:
102+
103+
```bash
104+
C:\Windows\System32\cmd.exe /C conda run --no-capture-output -p C:\Users\USERNAME\AppData\Local\lemonade_server\lemon_env lemonade serve
105+
```
106+
107+
#### Background Process
108+
109+
This command will open lemonade server without opening a window. Your application needs to manage terminating the process and any child processes it creates.
110+
111+
In a powershell terminal:
112+
113+
```powershell
114+
$serverProcess = Start-Process -FilePath "C:\Windows\System32\cmd.exe" -ArgumentList "/C conda run --no-capture-output -p INSTALL_DIR\lemonade_server\lemon_env lemonade serve" -RedirectStandardOutput lemonade_out.txt -RedirectStandardError lemonade_err.txt -PassThru -NoNewWindow
115+
```
116+
117+
Where `INSTALL_DIR` is the installation path of `lemonade_server`.

docs/lemonade/server_spec.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,13 @@ The additional endpoints under development are:
3535

3636
> **NOTE:** This server is intended for use on local systems only. Do not expose the server port to the open internet.
3737
38-
First, install lemonade with your desired backend (e.g., `pip install lemonade[llm]`). Then, run the following command to start the server:
38+
### Windows Installer
39+
40+
See the [Lemonade_Server_Installer.exe instructions](lemonade_server_exe.md) to get started.
41+
42+
### Python Environment
43+
44+
If you have `lemonade` [installed in a Python environment](getting_started.md#from-pypi), simply activate it and run the following command to start the server:
3945

4046
```bash
4147
lemonade serve
@@ -124,6 +130,7 @@ Text Completions API. You provide a prompt and receive a completion. This API wi
124130
| `model` | Yes | The model to use for the completion. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
125131
| `stream` | No | If true, tokens will be sent as they are generated. If false, the response will be sent as a single message once complete. Defaults to false. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
126132
| `stop` | No | Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. Can be a string or an array of strings. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
133+
| `echo` | No | Echo back the prompt in addition to the completion. Available on non-streaming mode. | <sub>![Status](https://img.shields.io/badge/available-green)</sub> |
127134
| `logprobs` | No | Include log probabilities of the output tokens. If true, returns the log probability of each output token. Defaults to false. | <sub>![Status](https://img.shields.io/badge/WIP-yellow)</sub> |
128135

129136

@@ -343,3 +350,20 @@ curl http://localhost:8000/api/v0/stats
343350
"decode_token_times": [0.01, 0.02, 0.03, 0.04, 0.05]
344351
}
345352
```
353+
354+
# Debugging
355+
356+
To help debug the Lemonade server, you can use the `--log-level` parameter to control the verbosity of logging information. The server supports multiple logging levels that provide increasing amounts of detail about server operations.
357+
358+
```
359+
lemonade serve --log-level [level]
360+
```
361+
362+
Where `[level]` can be one of:
363+
364+
- **critical**: Only critical errors that prevent server operation.
365+
- **error**: Error conditions that might allow continued operation.
366+
- **warning**: Warning conditions that should be addressed.
367+
- **info**: (Default) General informational messages about server operation.
368+
- **debug**: Detailed diagnostic information for troubleshooting, including metrics such as input/output token counts, Time To First Token (TTFT), and Tokens Per Second (TPS).
369+
- **trace**: Very detailed tracing information, including everything from debug level plus all input prompts.

installer/Installer.nsi

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,17 @@ Var LEMONADE_CONDA_ENV
2020
Var HYBRID_SELECTED
2121
Var HYBRID_CLI_OPTION
2222

23+
; Variables for CPU detection
24+
Var cpuName
25+
Var isCpuSupported
26+
Var ryzenAiPos
27+
Var seriesStartPos
28+
Var currentChar
29+
30+
; Used for string manipulation
31+
!include "StrFunc.nsh"
32+
${StrLoc}
33+
2334
; Define a section for the installation
2435
Section "Install Main Components" SEC01
2536
SectionIn RO ; Read only, always installed
@@ -296,7 +307,7 @@ LangString MUI_BUTTONTEXT_FINISH "${LANG_ENGLISH}" "Finish"
296307
LangString MUI_TEXT_LICENSE_TITLE ${LANG_ENGLISH} "AMD License Agreement"
297308
LangString MUI_TEXT_LICENSE_SUBTITLE ${LANG_ENGLISH} "Please review the license terms before installing AMD Ryzen AI Hybrid Execution Mode."
298309
LangString DESC_SEC01 ${LANG_ENGLISH} "The minimum set of dependencies for a lemonade server that runs LLMs on CPU."
299-
LangString DESC_HybridSec ${LANG_ENGLISH} "Add support for running LLMs on Ryzen AI hybrid execution mode, which uses both the NPU and iGPU for improved performance on Ryzen AI 300-series processors."
310+
LangString DESC_HybridSec ${LANG_ENGLISH} "Add support for running LLMs on Ryzen AI hybrid execution mode, which uses both the NPU and iGPU for improved performance. Only available on Ryzen AI 300-series processors."
300311

301312
; Insert the description macros
302313
!insertmacro MUI_FUNCTION_DESCRIPTION_BEGIN
@@ -317,6 +328,56 @@ Function .onInit
317328
StrCpy $InstDir "$LOCALAPPDATA\lemonade_server"
318329
${EndIf}
319330

331+
; Check CPU name to determine if Hybrid section should be enabled
332+
DetailPrint "Checking CPU model..."
333+
334+
; Use registry query to get CPU name
335+
nsExec::ExecToStack 'reg query "HKEY_LOCAL_MACHINE\HARDWARE\DESCRIPTION\System\CentralProcessor\0" /v ProcessorNameString'
336+
Pop $0 ; Return value
337+
Pop $cpuName ; Output (CPU name)
338+
DetailPrint "Detected CPU: $cpuName"
339+
340+
; Check if CPU name contains "Ryzen AI" and a 3-digit number starting with 3
341+
StrCpy $isCpuSupported "false" ; Initialize CPU allowed flag to false
342+
343+
${StrLoc} $ryzenAiPos $cpuName "Ryzen AI" ">"
344+
${If} $ryzenAiPos != ""
345+
; Found "Ryzen AI", now look for 3xx series
346+
${StrLoc} $seriesStartPos $cpuName " 3" ">"
347+
${If} $seriesStartPos != ""
348+
; Check if the character after "3" is a digit (first digit of model number)
349+
StrCpy $currentChar $cpuName 1 $seriesStartPos+2
350+
${If} $currentChar >= "0"
351+
${AndIf} $currentChar <= "9"
352+
; Check if the character after that is also a digit (second digit of model number)
353+
StrCpy $currentChar $cpuName 1 $seriesStartPos+3
354+
${If} $currentChar >= "0"
355+
${AndIf} $currentChar <= "9"
356+
; Check if the character after the third digit is a space or end of string
357+
StrCpy $currentChar $cpuName 1 $seriesStartPos+4
358+
${If} $currentChar == " "
359+
${OrIf} $currentChar == ""
360+
; Found a complete 3-digit number starting with 3
361+
StrCpy $isCpuSupported "true"
362+
DetailPrint "Detected Ryzen AI 3xx series processor"
363+
${EndIf}
364+
${EndIf}
365+
${EndIf}
366+
${EndIf}
367+
${EndIf}
368+
369+
DetailPrint "CPU is compatible with Ryzen AI hybrid software: $isCpuSupported"
370+
371+
; Check if CPU is in the allowed models list
372+
${If} $isCpuSupported != "true"
373+
; Disable Hybrid section if CPU is not in allowed list
374+
SectionGetFlags ${HybridSec} $0
375+
IntOp $0 $0 & ${SECTION_OFF} ; Turn off selection
376+
IntOp $0 $0 | ${SF_RO} ; Make it read-only (can't be selected)
377+
SectionSetFlags ${HybridSec} $0
378+
StrCpy $HYBRID_SELECTED "false"
379+
${EndIf}
380+
320381
; Disable hybrid mode by default in silent mode
321382
; Use /Extras="hybrid" option to enable it
322383
${If} ${Silent}
@@ -327,6 +388,10 @@ Function .onInit
327388
${IfNot} $HYBRID_CLI_OPTION == "hybrid"
328389
SectionSetFlags ${HybridSec} 0
329390
StrCpy $HYBRID_SELECTED "false"
391+
${ElseIf} $isCpuSupported != "true"
392+
; Don't allow hybrid mode if CPU is not in allowed list, even if specified in command line
393+
SectionSetFlags ${HybridSec} 0
394+
StrCpy $HYBRID_SELECTED "false"
330395
${EndIf}
331396
${EndIf}
332397

setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
"lemonade.tools",
2222
"lemonade.tools.ort_genai",
2323
"lemonade.tools.quark",
24+
"lemonade.tools.report",
2425
"turnkeyml_models",
2526
"turnkeyml_models.graph_convolutions",
2627
"turnkeyml_models.selftest",

src/lemonade/cli.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
import turnkeyml.cli.cli as cli
55
from turnkeyml.sequence import Sequence
66
from turnkeyml.tools.management_tools import Cache, Version, SystemInfo
7-
from turnkeyml.tools.report import Report
87
from turnkeyml.state import State
98

109
from lemonade.tools.huggingface_load import (
@@ -24,6 +23,7 @@
2423
from lemonade.tools.prompt import LLMPrompt
2524
from lemonade.tools.quark.quark_load import QuarkLoad
2625
from lemonade.tools.quark.quark_quantize import QuarkQuantize
26+
from lemonade.tools.report.llm_report import LemonadeReport
2727
from lemonade.tools.serve import Server
2828

2929

@@ -43,9 +43,9 @@ def main():
4343
OgaBench,
4444
QuarkQuantize,
4545
QuarkLoad,
46+
LemonadeReport,
4647
Server,
4748
# Inherited from TurnkeyML
48-
Report,
4949
Cache,
5050
Version,
5151
SystemInfo,

src/lemonade/tools/report/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)