|
1 | 1 | # VLM Offline Inference Pipeline |
2 | 2 |
|
3 | | -LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the the Large Language Model (LLM) inference [pipeline](./pipeline.md). |
4 | | -In this article, we will take the [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) model as an example, exhibiting the powerful capabilities of the VLM pipeline through various examples. |
5 | | -First, we will demonstrate the most basic utilization of the pipeline and progressively unveil additional functionalities by configuring the engine parameters and generation arguments, such as tensor parallelism, setting context window size, and random sampling, customizing chat template and so on. Next, we will provide inference examples for scenarios involving multiple images, batch prompts etc. |
| 3 | +LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference [pipeline](./pipeline.md). |
| 4 | + |
| 5 | +Currently, it supports the following models. |
| 6 | + |
| 7 | +- [Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) |
| 8 | +- LLaVA series: [v1.5](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e), [v1.6](https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155f60fd046a5ccf2) |
| 9 | +- [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B) |
| 10 | + |
| 11 | +We genuinely invite the community to contribute new VLM support to LMDeploy. Your involvement is truly appreciated. |
| 12 | + |
| 13 | +This article showcases the VLM pipeline using the [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) model as a case study. |
| 14 | +You'll learn about the simplest ways to leverage the pipeline and how to gradually unlock more advanced features by adjusting engine parameters and generation arguments, such as tensor parallelism, context window sizing, random sampling, and chat template customization. |
| 15 | +Moreover, we will provide practical inference examples tailored to scenarios with multiple images, batch prompts etc. |
6 | 16 |
|
7 | 17 | ## A 'Hello, world' example |
8 | 18 |
|
@@ -89,7 +99,7 @@ print(response) |
89 | 99 |
|
90 | 100 | ### Set chat template |
91 | 101 |
|
92 | | -While performing inference, LMDeploy identifies an appropriate chat template from its builtin collection based on the model path and subsequently applies this template to the input prompts. However, when a chat template cannot be told from its model path, users have to specify it. For example, liuhaotian/llava-v1.5-7b employs the 'vicuna' chat template, but the name 'vicuna' cannot be ascertained from the model's path. We can specify it by setting 'vicuna' to `ChatTemplateConfig` as follows: |
| 102 | +While performing inference, LMDeploy identifies an appropriate chat template from its builtin collection based on the model path and subsequently applies this template to the input prompts. However, when a chat template cannot be told from its model path, users have to specify it. For example, [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) employs the 'vicuna' chat template, but the name 'vicuna' cannot be ascertained from the model's path. We can specify it by setting 'vicuna' to `ChatTemplateConfig` as follows: |
93 | 103 |
|
94 | 104 | ```python |
95 | 105 | from lmdeploy import pipeline, ChatTemplateConfig |
|
0 commit comments