bump version to v0.2.6 (#1299)

lvhan028 · web-flow · commit b69e7176dffb · 2024-03-19T10:42:29.000+08:00
diff --git a/README.md b/README.md
@@ -150,12 +150,14 @@ Please overview [getting_started](./docs/en/get_started.md) section for the basi
 For detailed user guides and advanced guides, please refer to our [tutorials](https://lmdeploy.readthedocs.io/en/latest/):
 
 - User Guide
-  - [Inference pipeline](./docs/en/inference/pipeline.md)
-  - [Inference Engine - TurboMind](docs/en/inference/turbomind.md)
-  - [Inference Engine - PyTorch](docs/en/inference/pytorch.md)
-  - [Serving](docs/en/serving/api_server.md)
+  - [LLM Inference pipeline](./docs/en/inference/pipeline.md)
+  - [VLM Inference pipeline](./docs/en/inference/vl_pipeline.md)
+  - [LLM Serving](docs/en/serving/api_server.md)
+  - [VLM Serving](docs/en/serving/api_server_vl.md)
   - [Quantization](docs/en/quantization)
 - Advance Guide
+  - [Inference Engine - TurboMind](docs/en/inference/turbomind.md)
+  - [Inference Engine - PyTorch](docs/en/inference/pytorch.md)
   - [Customize chat templates](docs/en/advance/chat_template.md)
   - [Add a new model](docs/en/advance/pytorch_new_model.md)
   - gemm tuning
diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -151,12 +151,14 @@ print(response)
 为了帮助用户更进一步了解 LMDeploy，我们准备了用户指南和进阶指南，请阅读我们的[文档](https://lmdeploy.readthedocs.io/zh-cn/latest/)：
 
 - 用户指南
-  - [推理pipeline](./docs/zh_cn/inference/pipeline.md)
-  - [推理引擎 - TurboMind](./docs/zh_cn/inference/turbomind.md)
-  - [推理引擎 - PyTorch](./docs/zh_cn/inference/pytorch.md)
-  - [推理服务](./docs/zh_cn/serving/api_server.md)
+  - [LLM 推理 pipeline](./docs/zh_cn/inference/pipeline.md)
+  - [VLM 推理 pipeline](./docs/zh_cn/inference/vl_pipeline.md)
+  - [LLM 推理服务](./docs/zh_cn/serving/api_server.md)
+  - [VLM 推理服务](./docs/zh_cn/serving/api_server_vl.md)
   - [模型量化](./docs/zh_cn/quantization)
 - 进阶指南
+  - [推理引擎 - TurboMind](./docs/zh_cn/inference/turbomind.md)
+  - [推理引擎 - PyTorch](./docs/zh_cn/inference/pytorch.md)
   - [自定义对话模板](./docs/zh_cn/advance/chat_template.md)
   - [支持新模型](./docs/zh_cn/advance/pytorch_new_model.md)
   - gemm tuning
diff --git a/docs/en/inference/vl_pipeline.md b/docs/en/inference/vl_pipeline.md
@@ -1,8 +1,18 @@
 # VLM Offline Inference Pipeline
 
-LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the the Large Language Model (LLM) inference [pipeline](./pipeline.md).
-In this article, we will take the [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) model as an example, exhibiting the powerful capabilities of the VLM pipeline through various examples.
-First, we will demonstrate the most basic utilization of the pipeline and progressively unveil additional functionalities by configuring the engine parameters and generation arguments, such as tensor parallelism, setting context window size, and random sampling, customizing chat template and so on. Next, we will provide inference examples for scenarios involving multiple images, batch prompts etc.
+LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference [pipeline](./pipeline.md).
+
+Currently, it supports the following models.
+
+- [Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat)
+- LLaVA series: [v1.5](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e), [v1.6](https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155f60fd046a5ccf2)
+- [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B)
+
+We genuinely invite the community to contribute new VLM support to LMDeploy. Your involvement is truly appreciated.
+
+This article showcases the VLM pipeline using the [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) model as a case study.
+You'll learn about the simplest ways to leverage the pipeline and how to gradually unlock more advanced features by adjusting engine parameters and generation arguments, such as tensor parallelism, context window sizing, random sampling, and chat template customization.
+Moreover, we will provide practical inference examples tailored to scenarios with multiple images, batch prompts etc.
 
 ## A 'Hello, world' example
 
@@ -89,7 +99,7 @@ print(response)
 
 ### Set chat template
 
-While performing inference, LMDeploy identifies an appropriate chat template from its builtin collection based on the model path and subsequently applies this template to the input prompts. However, when a chat template cannot be told from its model path, users have to specify it. For example, liuhaotian/llava-v1.5-7b employs the 'vicuna' chat template, but the name 'vicuna' cannot be ascertained from the model's path. We can specify it by setting 'vicuna' to `ChatTemplateConfig` as follows:
+While performing inference, LMDeploy identifies an appropriate chat template from its builtin collection based on the model path and subsequently applies this template to the input prompts. However, when a chat template cannot be told from its model path, users have to specify it. For example, [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) employs the 'vicuna' chat template, but the name 'vicuna' cannot be ascertained from the model's path. We can specify it by setting 'vicuna' to `ChatTemplateConfig` as follows:
 
 ```python
 from lmdeploy import pipeline, ChatTemplateConfig
diff --git a/docs/zh_cn/inference/vl_pipeline.md b/docs/zh_cn/inference/vl_pipeline.md
@@ -1,7 +1,18 @@
 # VLM 离线推理 pipeline
 
-LMDeploy 把视觉-语言模型（VLM）复杂的推理过程，抽象为简单好用的 pipeline。它的用法与大语言模型（LLM）推理 [pipeline](./pipeline.md) 类似。本文将以 [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) 模型为例，通过若干示例，展示 VLM pipeline 的强大能力。
-首先，我们会展示 pipeline 最基础的用法，并在此基础上，通过引擎的配置和生成条件配置，逐步引出更多能力，比如模型并行、自定义上下文长度、随机采样等等。然后，针对多图、批量提示词等场景，给出对应的推理示例。
+LMDeploy 把视觉-语言模型（VLM）复杂的推理过程，抽象为简单好用的 pipeline。它的用法与大语言模型（LLM）推理 [pipeline](./pipeline.md) 类似。
+
+目前，VLM pipeline 支持以下模型：
+
+- [Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat)
+- LLaVA series: [v1.5](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e), [v1.6](https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155f60fd046a5ccf2)
+- [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B)
+
+我们诚挚邀请社区在 LMDeploy 中添加更多 VLM 模型的支持。
+
+本文将以 [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) 模型为例，展示 VLM pipeline 的用法。你将了解它的最基础用法，以及如何通过调整引擎参数和生成条件来逐步解锁更多高级特性，如张量并行，上下文窗口大小调整，随机采样，以及对话模板的定制。
+
+此外，我们还提供针对多图、批量提示词等场景的实际推理示例。
 
 ## "Hello, world" 示例
 
diff --git a/lmdeploy/version.py b/lmdeploy/version.py
@@ -1,7 +1,7 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 from typing import Tuple
 
-__version__ = '0.2.5'
+__version__ = '0.2.6'
 short_version = __version__