Skip to content

Commit b69e717

Browse files
authored
bump version to v0.2.6 (#1299)
1 parent d6c9847 commit b69e717

File tree

5 files changed

+40
-15
lines changed

5 files changed

+40
-15
lines changed

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -150,12 +150,14 @@ Please overview [getting_started](./docs/en/get_started.md) section for the basi
150150
For detailed user guides and advanced guides, please refer to our [tutorials](https://lmdeploy.readthedocs.io/en/latest/):
151151

152152
- User Guide
153-
- [Inference pipeline](./docs/en/inference/pipeline.md)
154-
- [Inference Engine - TurboMind](docs/en/inference/turbomind.md)
155-
- [Inference Engine - PyTorch](docs/en/inference/pytorch.md)
156-
- [Serving](docs/en/serving/api_server.md)
153+
- [LLM Inference pipeline](./docs/en/inference/pipeline.md)
154+
- [VLM Inference pipeline](./docs/en/inference/vl_pipeline.md)
155+
- [LLM Serving](docs/en/serving/api_server.md)
156+
- [VLM Serving](docs/en/serving/api_server_vl.md)
157157
- [Quantization](docs/en/quantization)
158158
- Advance Guide
159+
- [Inference Engine - TurboMind](docs/en/inference/turbomind.md)
160+
- [Inference Engine - PyTorch](docs/en/inference/pytorch.md)
159161
- [Customize chat templates](docs/en/advance/chat_template.md)
160162
- [Add a new model](docs/en/advance/pytorch_new_model.md)
161163
- gemm tuning

README_zh-CN.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -151,12 +151,14 @@ print(response)
151151
为了帮助用户更进一步了解 LMDeploy,我们准备了用户指南和进阶指南,请阅读我们的[文档](https://lmdeploy.readthedocs.io/zh-cn/latest/)
152152

153153
- 用户指南
154-
- [推理pipeline](./docs/zh_cn/inference/pipeline.md)
155-
- [推理引擎 - TurboMind](./docs/zh_cn/inference/turbomind.md)
156-
- [推理引擎 - PyTorch](./docs/zh_cn/inference/pytorch.md)
157-
- [推理服务](./docs/zh_cn/serving/api_server.md)
154+
- [LLM 推理 pipeline](./docs/zh_cn/inference/pipeline.md)
155+
- [VLM 推理 pipeline](./docs/zh_cn/inference/vl_pipeline.md)
156+
- [LLM 推理服务](./docs/zh_cn/serving/api_server.md)
157+
- [VLM 推理服务](./docs/zh_cn/serving/api_server_vl.md)
158158
- [模型量化](./docs/zh_cn/quantization)
159159
- 进阶指南
160+
- [推理引擎 - TurboMind](./docs/zh_cn/inference/turbomind.md)
161+
- [推理引擎 - PyTorch](./docs/zh_cn/inference/pytorch.md)
160162
- [自定义对话模板](./docs/zh_cn/advance/chat_template.md)
161163
- [支持新模型](./docs/zh_cn/advance/pytorch_new_model.md)
162164
- gemm tuning

docs/en/inference/vl_pipeline.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,18 @@
11
# VLM Offline Inference Pipeline
22

3-
LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the the Large Language Model (LLM) inference [pipeline](./pipeline.md).
4-
In this article, we will take the [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) model as an example, exhibiting the powerful capabilities of the VLM pipeline through various examples.
5-
First, we will demonstrate the most basic utilization of the pipeline and progressively unveil additional functionalities by configuring the engine parameters and generation arguments, such as tensor parallelism, setting context window size, and random sampling, customizing chat template and so on. Next, we will provide inference examples for scenarios involving multiple images, batch prompts etc.
3+
LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference [pipeline](./pipeline.md).
4+
5+
Currently, it supports the following models.
6+
7+
- [Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat)
8+
- LLaVA series: [v1.5](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e), [v1.6](https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155f60fd046a5ccf2)
9+
- [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B)
10+
11+
We genuinely invite the community to contribute new VLM support to LMDeploy. Your involvement is truly appreciated.
12+
13+
This article showcases the VLM pipeline using the [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) model as a case study.
14+
You'll learn about the simplest ways to leverage the pipeline and how to gradually unlock more advanced features by adjusting engine parameters and generation arguments, such as tensor parallelism, context window sizing, random sampling, and chat template customization.
15+
Moreover, we will provide practical inference examples tailored to scenarios with multiple images, batch prompts etc.
616

717
## A 'Hello, world' example
818

@@ -89,7 +99,7 @@ print(response)
8999

90100
### Set chat template
91101

92-
While performing inference, LMDeploy identifies an appropriate chat template from its builtin collection based on the model path and subsequently applies this template to the input prompts. However, when a chat template cannot be told from its model path, users have to specify it. For example, liuhaotian/llava-v1.5-7b employs the 'vicuna' chat template, but the name 'vicuna' cannot be ascertained from the model's path. We can specify it by setting 'vicuna' to `ChatTemplateConfig` as follows:
102+
While performing inference, LMDeploy identifies an appropriate chat template from its builtin collection based on the model path and subsequently applies this template to the input prompts. However, when a chat template cannot be told from its model path, users have to specify it. For example, [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) employs the 'vicuna' chat template, but the name 'vicuna' cannot be ascertained from the model's path. We can specify it by setting 'vicuna' to `ChatTemplateConfig` as follows:
93103

94104
```python
95105
from lmdeploy import pipeline, ChatTemplateConfig

docs/zh_cn/inference/vl_pipeline.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,18 @@
11
# VLM 离线推理 pipeline
22

3-
LMDeploy 把视觉-语言模型(VLM)复杂的推理过程,抽象为简单好用的 pipeline。它的用法与大语言模型(LLM)推理 [pipeline](./pipeline.md) 类似。本文将以 [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) 模型为例,通过若干示例,展示 VLM pipeline 的强大能力。
4-
首先,我们会展示 pipeline 最基础的用法,并在此基础上,通过引擎的配置和生成条件配置,逐步引出更多能力,比如模型并行、自定义上下文长度、随机采样等等。然后,针对多图、批量提示词等场景,给出对应的推理示例。
3+
LMDeploy 把视觉-语言模型(VLM)复杂的推理过程,抽象为简单好用的 pipeline。它的用法与大语言模型(LLM)推理 [pipeline](./pipeline.md) 类似。
4+
5+
目前,VLM pipeline 支持以下模型:
6+
7+
- [Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat)
8+
- LLaVA series: [v1.5](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e), [v1.6](https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155f60fd046a5ccf2)
9+
- [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B)
10+
11+
我们诚挚邀请社区在 LMDeploy 中添加更多 VLM 模型的支持。
12+
13+
本文将以 [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) 模型为例,展示 VLM pipeline 的用法。你将了解它的最基础用法,以及如何通过调整引擎参数和生成条件来逐步解锁更多高级特性,如张量并行,上下文窗口大小调整,随机采样,以及对话模板的定制。
14+
15+
此外,我们还提供针对多图、批量提示词等场景的实际推理示例。
516

617
## "Hello, world" 示例
718

lmdeploy/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Copyright (c) OpenMMLab. All rights reserved.
22
from typing import Tuple
33

4-
__version__ = '0.2.5'
4+
__version__ = '0.2.6'
55
short_version = __version__
66

77

0 commit comments

Comments
 (0)