v1.4.0 #838
ggerganov
announced in
Announcements
v1.4.0
#838
Replies: 1 comment
-
|
Is it possible to get command tool binary for windows in this version ? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
This is a new major release adding integer quantization and partial GPU (NVIDIA) support
Integer quantization
This allows the
ggmlWhisper models to be converted from the default 16-bit floating point weights to 4, 5 or 8 bit integer weights.The resulting quantized models are smaller in disk size and memory usage and can be processed faster on some architectures. The transcription quality is degraded to some extend - not quantified at the moment.
Q4_0,Q4_1,Q4_2,Q5_0,Q5_1,Q8_0Q5quantized models: https://whisper.ggerganov.comHere is a quantitative evaluation of the different quantization modes applied to the LLaMA and RWKV large language models. These results can give an impression about the expected quality, size and performance improvements for quantized Whisper models:
LLaMA quantization (measured on M1 Pro)
ref: https://github.com/ggerganov/llama.cpp#quantization
RWKV quantization
Q4_0Q4_1Q4_2Q5_0Q5_1Q8_0FP16FP32ref: ggml-org/ggml#89 (comment)
This feature is possible thanks to the many contributions in the llama.cpp project: ggml : improve integer quantization
GPU support via cuBLAS
Using cuBLAS results mainly in improved Encoder inference speed. I haven't done proper timings, but one can expect at least 2-3 times faster Encoder evaluation with modern NVIDIA GPU cards compared to CPU-only processing. Feel free to post your Encoder benchmarks in issue #89.
This is another feature made possible by the llama.cpp project. Special recognition to @slaren for putting almost all of this work together
This release remains in "beta" stage as I haven't verified that everything works as expected.
What's Changed
New Contributors
Full Changelog: v1.3.0...v1.4.0
This discussion was created from the release v1.4.0.
Beta Was this translation helpful? Give feedback.
All reactions