v6.0.0
Summary
This is a major release that introduces an OpenAI-compatible server in a completely new serve tool, support for Quark quantization in the new quark tool, and many other fixes/improvements.
Breaking Changes
New OpenAI-Compatible Server
The previous serve Tool has been replaced by a new standalone serving command. This new server has OpenAI API compatibility and will add Ollama compatibility in the near future.
- Old usage:
lemoande -i CHECKPOINT oga-load --args serve - New usage:
lemonade serve, then use REST APIs to control model loading, completions, etc. See https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md to learn more.
The server can also be installed and used with no-code by running Lemonade_Server_Installer.exe, which is provided as a release asset in this and all future releases.
The server code was also moved out of tools/chat.py into its own file in tools/serve.py. We also renamed chat.py to prompt.py for clarity, since that file now only contains the prompting tool.
The LEAP name has been deprecated
In the interest of reducing naming confusion, the "LEAP API" is now simply the "high-level lemonade API".
- Old usage:
from lemonade.leap import from_pretrained - New usage:
from lemonade.api import from_pretrained
Summary of Contributions
- The base checkpoint for models is retrieved from the Hugging Face API at loading time (@ramkrishna2910)
- The benchmarking tools (huggingface-bench, oga-bench, and llamacpp-bench) have been refactored to reduce code duplication and improve maintainability. They now also support a list of prompts (or prompt lengths) to be benchmarked:
--prompts 128 256 512(@amd-pworfolk) - The
avg_accuracystats has been renamed toaverage_mmlu_accuracyfor clarity with respect to non-MMLU accuracy tests (@jeremyfowers), (attn @apsonawane) - Introduce
Lemonade_Server_Installer.exe(@jeremyfowers) - Implement an OpenAI-compatible server and remove the old
servetool (@danielholanda) - Rename
chatmodule toprompt(@jeremyfowers) - Improved lemonade getting started documentation and remove the "LEAP" branding (@jeremyfowers)
- OGA 0.6.0 is the default package for CPU, CUDA, and DML (@jeremyfowers)
- Add support for Quark quantization with a new
quark-quantizetool (@iswaryaalex) - Clean up the lemonade getting started docs and remove some deprecated tools (@jeremyfowers)
New Contributors
- @iswaryaalex made their first contribution in #290
Full Changelog: v5.1.1...v6.0.0